137 Коммитов

Автор SHA1 Сообщение Дата
Benjamin Welton 1517a398bf [rocprofiler-sdk] Buffer finalization fixes and HSA ABI 0x09 support (#2318)
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements

Buffer Pool Design
------------------
Replace the fixed array-based double buffer with a dynamic pool design to
fix race conditions that caused "internal correlation id was retired
prematurely" errors.

The original design had a race where flush callbacks could be delivered
out-of-order: when buffer 0 fills and begins flushing, writes go to
buffer 1. If buffer 1 fills before buffer 0's flush completes, the
buffer index wraps back to 0 (which may still be flushing). Independent
flush tasks submitted to the thread pool can complete out of order.

The new pool design:
- Uses a std::deque of buffer instances that grows as needed
- Allocates buffers from the pool when the current buffer needs to flush
- Serializes flushes with a mutex to ensure FIFO callback ordering
- Returns buffers to the pool after flush completion
- Eliminates the race between buffer selection and write operations

New Unit Tests
--------------
- buffer_correlation_ordering.cpp: Tests that API records are always
  delivered before their corresponding retirement records
- buffer_ordering_stress.cpp: Stress tests buffer flush ordering under
  high contention with multiple threads rapidly filling buffers

HSA Tool Hooks
--------------
Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that
waits for pending flush tasks before tool finalization, preventing
"retired prematurely" errors during HSA shutdown.

Sanitizer Improvements
----------------------
- LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder
- LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup)
- TSAN: Added suppression for false positive on C++11 thread-safe static
  initialization in create_write_functor
- ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto
- Disabled attachment tests for sanitizers due to library preloading issues

Other Fixes
-----------
- Thread-trace agent test: Use heap-allocated callback state
- Correlation ID: Refactored reference counting and finalization ordering

* [rocprofiler-sdk] Revert buffer pool design changes

Revert buffer.cpp and buffer.hpp to the original double-buffer
design from develop branch. The pool-based redesign introduced
concerns about:
- Signal safety (mutex vs atomic_flag)
- API changes (flush() return type)
- Complexity of the new design

This revert removes:
- Dynamic buffer pool with std::deque
- std::mutex/condition_variable synchronization
- buffer_correlation_ordering.cpp test
- buffer_ordering_stress.cpp test

The underlying buffer flush ordering issue will need to be
addressed with a different approach that preserves the original
API and synchronization characteristics.

* [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization

- Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks
- Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning
- Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp)
- Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior:
  - hsa/queue.cpp (lines 105, 210)
  - hsa/async_copy.cpp (line 344)
  - hsa/hsa_barrier.cpp (line 43)
  - buffer.cpp (lines 107, 138, 185)

This ensures no correlation IDs are created once finalization starts (fini_status != 0),
preventing races between finalization and ongoing tracing operations.

* [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation

Buffer records are not guaranteed to arrive in any specific order. Tests and
samples should use timestamps for temporal ordering validation instead.

Changes:
- samples/external_correlation_id_request: Replace 'retired prematurely' arrival
  order check with timestamp-based validation that retirement timestamp >=
  max(end_timestamps) for records with the same correlation ID
- tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check

Correlation IDs are not guaranteed to be monotonically increasing when records
are sorted by timestamp. Temporal ordering should be validated using the
timestamp fields in each record.

* [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal

Restore the SYSTEM keyword to target_include_directories for
rocprofiler-sdk-fmt to match develop branch.

* [rccl] Remove orphaned rocSHMEM gitlink

Remove orphaned submodule reference that was introduced during a merge
but never had a corresponding .gitmodules entry, causing CI failures
with "fatal: no submodule mapping found in .gitmodules".

* [rocprofiler-sdk] Add HSA ABI version 0x09 support

Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which
introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release
functions (added in rocr-runtime SWDEV-561708).

* [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations

This commit consolidates fixes for handling the finalization status during
buffer flush operations across the SDK.

Changes:
- Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully
  when flushing buffers, as this indicates buffers were already flushed
  during finalization (not an error condition)
- HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check
  for fini_status to allow operations during finalization process
- buffer.cpp: Revert fini_status checks to use > 0 for consistency
- correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging
  to prevent correlation ID creation after finalization starts

Files modified:
- source/lib/rocprofiler-sdk-tool/tool.cpp
- tests/tools/json-tool.cpp
- source/lib/rocprofiler-sdk/tests/registration.cpp
- source/lib/rocprofiler-sdk/tests/roctx.cpp
- samples/api_buffered_tracing/client.cpp
- samples/counter_collection/buffered_client.cpp
- samples/counter_collection/device_counting_async_client.cpp
- samples/external_correlation_id_request/client.cpp
- samples/pc_sampling/client.cpp
- source/lib/rocprofiler-sdk/buffer.cpp
- source/lib/rocprofiler-sdk/context/correlation_id.cpp
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- source/lib/rocprofiler-sdk/hsa/async_copy.cpp
- source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp

* [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling

Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls
in samples and tools. The ERROR_FINALIZED handling was overly complex
and the hsa_tool_hooks OnUnload synchronization is no longer needed.

Changes:
- Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code
- Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL
- Simplify buffer flush in tool.cpp and json-tool.cpp
- Remove ERROR_FINALIZED special handling from test files

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Fix output_stream move semantics to null source pointers

The default move constructor and move assignment operator for
output_stream did not null out the source's pointers after the move.
This caused double-close when the moved-from temporary was destroyed,
leading to use-after-free crashes (SIGSEGV in std::ostream::sentry).

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration

- generatePerfetto.cpp: Move output_stream into shared_state to prevent
  use-after-free race conditions during Perfetto callback execution
- run-ci.py: Simplify and consolidate sanitizer environment variable
  configuration for better maintainability

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions

The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required
for CTest to properly pass suppression files to the sanitizers during
memcheck runs.

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert "[rccl] Remove orphaned rocSHMEM gitlink"

This reverts commit 1ad21003941355658fff8114fa27768f11a948f7.

* [rocprofiler-sdk] Revert registration.cpp changes

Revert changes to registration.cpp to match develop branch.

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Remove suppression file content printing from run-ci.py

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix output_stream move ctor/assignment operator

* Fix erroneous revert of registration.cpp

* Fix handling of fini status in correlation ID construction

* [rocprofiler-sdk] Fix OMPT segfault during finalization

Add nullptr checks in OMPT tracing code to handle the case where
correlation_tracing_service::construct() returns nullptr during
finalization. This fixes segfaults in openmp-target-sample and
tests.integration.execute.openmp-tools.

The correlation ID construction now returns nullptr when fini_status > 0,
but the OMPT callbacks were not checking for this, causing crashes when
dereferencing the null pointer during OpenMP runtime shutdown.

Changes:
- event_common(): Return nullptr early if correlation ID is null
- event(): Check for nullptr before calling sub_ref_count()
- ompt_task_create_callback(): Return early if correlation ID is null
- ompt_task_schedule_callback(): Return early if correlation ID is null

* [rocprofiler-sdk] Fix HSA API tracing segfault during finalization

Add nullptr check in hsa_api_impl::functor after correlation ID
construction. During finalization, correlation_service::construct()
returns nullptr, and without this check the code would dereference
the null pointer when accessing corr_id->internal.

This fixes the SEGV at address 0x000000000008 (null + 8 byte offset)
that occurs when HSA async event threads call hsa_signal_destroy
during runtime shutdown after finalization has started.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-27 13:27:54 -05:00
Benjamin Welton d496bcef18 Fix dimension mismatch for multi-GPU systems with identical architect… (#1440)
* Fix dimension mismatch for multi-GPU systems with identical architectures

This change addresses an issue where counter dimensions were incorrectly
shared across all GPU agents with the same architecture name, even when
those agents had different hardware configurations (e.g., different CU counts).

Changes:
- Updated getBlockDimensions() to accept agent ID instead of architecture name
- Made dimension cache agent-specific instead of architecture-specific
- Updated set_dimensions() in AST evaluation to use specific agent ID
- Modified all API functions to handle agent-specific dimension lookups
- Updated tests to work with agent-specific dimensions

This fix ensures that dimensions accurately reflect the actual hardware
configuration of each individual GPU agent, preventing dimension mismatches
in multi-GPU systems where GPUs share the same architecture but have
different physical configurations.

Counter ID Representation Changes:
- Modified counter_id encoding to include agent information in bits 37-32
- Agent logical_node_id is encoded as (value + 1) to ensure agent 0 is detectable
- Counter records internally store only 16-bit base metric IDs (bits 15-0)
- Tool reconstructs agent-encoded counter IDs from base metric ID & agent info
- Instance record counter_id field uses bitwise AND mask to extract base metric ID
  (counter_id.handle & 0xFFFF) to fit in 16-bit storage
- Output generators (CSV, JSON, Perfetto) use agent-encoded IDs for consistency
- Updated counter_config.cpp and metrics.cpp to extract base metric ID when needed
- All counter lookups now properly handle agent-encoded vs base metric IDs

This ensures counter IDs are consistent between metadata and output records while
maintaining compact storage in instance records.
2025-10-27 07:58:20 -07:00
Giovanni Lenzi Baraldi dbb48c3e33 Fix for dynamic code object loading in the thread trace sample (#1386)
* Fix for dynamic code object loading in the thread trace sample

* Review comments
2025-10-21 16:22:26 +02:00
Jonathan R. Madsen 4cca398b56 [rocprofiler-sdk] Update rocprofiler-sdk CONTRIBUTING.md (#1371) 2025-10-20 21:46:24 -05:00
Giovanni Lenzi Baraldi 9849073836 SWDEV-540648: Adding realtime clock to v3 tool. Update decoder header. (#666)
* SWDEV-540648: Adding realtime clock to v3 tool. Update header for decoder.

* Adding tests

* Review comments

* Review comment
2025-09-10 12:39:27 +02:00
Kandula, Venkateshwar reddy b072e7e38c [Samples] Remove thread trace sample dependency on rocprofiler-sdk-amd-comgr. (#555)
* remove samples dependency on rocprofiler-sdk-amd-comgr.

* add find package for amd_comgr.

---------

Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>

[ROCm/rocprofiler-sdk commit: e0901eba28]
2025-07-31 10:24:32 -05:00
Madsen, Jonathan 35d491159f [CMake] Fix thread trace sample ENVIRONMENT test property (#544)
Fix thread trace samples set tests properties

[ROCm/rocprofiler-sdk commit: 735b5c3d4a]
2025-07-24 15:23:37 -05:00
Baraldi, Giovanni 4ca156e572 Thread trace and Trace Decoder API tests and samples (#416)
* Adding test and samples to decoder

* Fix sample

* Formatting

* Fix multi test

* Disable sample

* Fix tests

* Format

* Version fix

* Locking the decoder

* Add atomic

* Review comments

* Format

* Adding readme

* merge conflict and adding PCS+ATT test

* Review comments

* Properly disable PCS test

* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt

* Adding back env var test

* Name fix

* Preload sample

* Addressing review comments

* Update docs

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>

[ROCm/rocprofiler-sdk commit: e898079a13]
2025-07-22 20:08:12 -05:00
Kandula, Venkateshwar reddy 0ff0ffffa2 [SDK] Expose counter dims in rocprofiler_counter_info_v1_t and only show counters being profiled in metadata. (#325)
* expose dimensional info in rocprofiler_counter_info_v1_t.

* add counter_id in dim info.

* address review comments

* format.

* address comments.

* use array of pointers for dimensions_instaces.

* format and comments.

* address comments.

* new line.

* Update counter_defs.yaml

* Update counter_defs.yaml

* Update counter_defs.yaml

* counter_defs.

* format counter defs.

* format counter defs.

* format counter defs.

* show only counters being profiled in metadata.

* Format.

* use config for counters and fix warnings.

* add version for rocprofiler_counter_dimension_info_v1_t struct.

* rename rocprofiler_counter_record_dimension_instance_v1_info_t.

* account device id from pmc for counters metadata.

* move dim structs to counters.h.

* address comments to compare value.

* fix tests.

* Address comments. use pointer of arrays for ABI.

* rebase.

* fix build error.

* use separate metadata::init() for rocprofv3.

* also print not found counters.

* precompute all the perf counters needed to be in metadata.

* Misc.

* format

* Format.

* rocprofiler::sdk::container::c_array

* Address comments.

* source/lib/output/metadata.cpp

* lint.

* add unit test for c_array.

* add unit test and serialization support for c_array container.

* Misc.

* Clean files.

* Format.

* clang-tidy.

* add more checks to c_array.

* misc. typo

* Addr comments.

---------

Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>

[ROCm/rocprofiler-sdk commit: bf0fad1d54]
2025-07-22 14:24:25 -07:00
Madsen, Jonathan fff7146ab8 [CI] Disable other unstable tests (#490)
* Disable other unstable tests

* Disable validating test_total_runtime in kernel-tracing

* The disabled tests will be stabilized and re-enabled by ROCm 7.0.1 or ROCm 7.1 

[ROCm/rocprofiler-sdk commit: 69f71b8097]
2025-06-30 15:42:38 -05:00
Madsen, Jonathan 839c07c4aa [CI] Testing stability (#486)
* [CI] Testing Stability

- CMake option ROCPROFILER_DISABLE_UNSTABLE_CTESTS
  - used for tests which periodically fail around 1 out of every 10 runs
  - set to ON while instability remains, this needs to set to OFF in ROCm 7.1 or, ideally, ROCm 7.0.1
- Use FIXTURES_SETUP and FIXTURES_REQUIRED for some tests
- replace "threw an exception" with "${ROCPROFILER_DEFAULT_FAIL_REGEX}" for misc FAIL_REGULAR_EXPRESSIONS

* Remove contents of all EXCLUDE_{TESTS,LABEL}_REGEX from CI workflow

* Disable patch git step in code-coverage run

* Tweak spin time of reproducible runtime

* Removed patch git step in code-coverage run

* Update ROCPROFILER_DEFAULT_FAIL_REGEX

* Mark test-counter-collection tests as unstable

- add fixtures setup/required

* Remove ATTACHED_FILES_ON_FAIL

- CDash doesn't store enable downloading these properly anyway

* Relax collection-period fuzzing window

* Disable unstable collection-period test

- too unstable

* formatting

* Disable unstable device_counting_service_test.async_counters

* Suppress perfetto internal data race errors

* Switch code-coverage CI jobs to mi300 runner

* Timeout increases

* rocprofv3-test-rocpd updates

- add fixtures
- switch executable
- redefine input/output paths

* Revert code-coverage job to mi300a runner

* Update rocprofv3-test-rocpd-execute-multiproc

- reduce problem size

* disable multiproc rocpd

* Split code-coverage into separate workflow

- network issues cause this job to fail frequently
- when in a separate workflow, it can be restarted easily

* Fixtures for rocprofv3-test-trace-hip-in-libraries

* Disable unstable device_counting_service_test.sync_counters

* Potential fix for code scanning alert no. 171: Workflow does not contain permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Switch code-coverage to run on rocprof-azure

- mi300a EMU runner set is unstable (network issues)

* tests/rocprofv3/pc-sampling SKIP_REGULAR_EXPRESSION

* Update rocprofv3-test-list-avail-trace-execute

- reduce log level and increase timeout

* rocprofv3: Prevent recursive call to rocprofv3_error_signal_handler + log chaining

* rocprofv3: Use ROCP_ERROR + std::exit instead of ROCP_FATAL

- should help with SKIP_REGULAR_EXPRESSION

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

[ROCm/rocprofiler-sdk commit: 640ca55ac0]
2025-06-30 15:07:37 -05:00
Kandula, Venkateshwar reddy 8f5d00ca5d [SDK] Add gfx950 targets for tests and samples (#399)
* add gfx950 targets.

* add gfx950 targets to ci workflows.

* Format.

---------

Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
Co-authored-by: Vaddireddy, Sushma <Sushma.Vaddireddy@amd.com>

[ROCm/rocprofiler-sdk commit: 375866383b]
2025-06-26 14:25:08 -05:00
Kuricheti, Mythreya bde07e7baa [SDK] KFD new events API (#321)
* Remove page-migration

* Add KFD events API

* Address review comments

* Move assert checks

* Update enum-string utils

* Update codeowners

* Update KFD header

* Add perfetto category

[ROCm/rocprofiler-sdk commit: 8a461afe20]
2025-06-26 13:28:45 -05:00
Elwazir, Ammar 4d79e1df30 [SDK] Support CMake option for using internal RCCL tracing + (temporary) enable in CI (#457)
* Temp: disable RCCL tracing

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Adding option to disable rccl tracing from CMake

* Update codeql.yml

* Misc updates

- ROCPROFILER_BUILD_RCCL -> ROCPROFILER_INTERNAL_RCCL_API_TRACE
- env.EXTRA_TEMP_CMAKE_OPTIONS -> env.GLOBAL_CMAKE_OPTIONS
- add (advanced) option ROCPROFILER_INTERNAL_RCCL_API_TRACE

* Fix rocprofiler::sdk::get_enum_label

- missing enum labels for HIP_RUNTIME_API_TABLE_STEP_VERSION > 8

* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt

- improve various aspect of cmake -- particularly echoing where attdecoder_LIBRARY was found

* Use CMAKE_MESSAGE_INDENT

- add prefix to cmake messages to help indicate where messages are coming from
- make find_package(Python3 ...) QUIET for bindings

* Fix rocprofiler::sdk::get_enum_label

- handle HSA_AMD_EXT_API_TABLE_MAJOR_VERSION

* Fix rocprofv3 message for att library path

* Fix tests/rocprofv3/advanced-thread-trace/att_input.yml config

* Fix rocprofv3 check_att_capability + soversion/version library resolution

- Account for ROCPROF_ATT_LIBRARY_PATH in env in check_att_capability
- Add resolve_library_path
  - supports resolution of library names to SOVERSION and VERSION paths

* Fix python linting error (unused import)

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: aeb1621c2b]
2025-06-17 08:32:54 -05:00
Indic, Vladimir e5097d6a36 [ROCm 7.0] [PC sampling] Disable PC sampling tests if the GFX arch doesn't support it (#436)
Disable PC sampling tests if the GFX arch doesn't support it

[ROCm/rocprofiler-sdk commit: a24f486732]
2025-06-06 18:47:47 +02:00
Bhardwaj, Gopesh d47d3d3e9c Fixing SDK compilation issues due to missing header (#433)
[ROCm/rocprofiler-sdk commit: 918270bf63]
2025-06-04 13:39:41 +05:30
Madsen, Jonathan adc4e6995d [SDK] Support HIP 7.0 API changes (#432)
* [Do not merge] Make changes to api_args

* Support HIP 7.0 API changes

- Provide ROCPROFILER_SDK_ variants of ROCPROFILER_ version defines
- Provide ROCPROFILER_SDK_COMPUTE_VERSION
- hipCtxGetApiVersion changes parameter from int* to unsigned int*
- hipMemcpyHtoD and hipMemcpyHtoDAsync changed void* to const void*

* Fix comment

---------

Co-authored-by: Jatin Chaudhary <jatchaud@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: caf1a2174e]
2025-06-03 21:50:50 -05:00
Madsen, Jonathan b097e276a9 [rocprofv3] Add rocpd output support (part 1: prelude) (#401)
* [rocprofv3] Add rocpd output support (part 1: prelude)

- git submodules for sqlite3, GOTCHA, and pybind11
- HIP stream data
- rocprofiler_query_intercept_table_name(...)
- serialization load
- rocprofiler::sdk::get_perfetto_category(KindT)
- rocprofiler::sdk::parse::strip
- common library updates
  - md5sum
  - hasher
  - simple_timer
  - static_tl_object
  - get_process_start_time_ns(pid_t)
- output library updates
  - node_info
  - file_generator (generator is now virtual base class)
  - stream info updates

* Added submodules

* Code review updates

* Minor unused-but-set-X warning fixes

* Update CI

- install libsqlite3-dev package

* Update CI

- install libsqlite3-dev package

* Fix static thread-local object memory leak

- also fix signal handler chaining

* Remove URL from comment

* Remove page migration exception

* Enable ROCPROFILER_BUILD_SQLITE3 by default

- try find_package(SQLite3) first and then build when ROCPROFILER_BUILD_SQLITE3=ON

* Fix gotcha installation

- make install of target optional

* Validate tracing + counter collection dispatch data

- i.e. correlation ids, thread ids, timestamps

* Make find_package(SQLite3) optional

- ROCm CI does not have SQLite3 dev package installed and cannot build from source (missing tclsh)

* Fixes to tracing + counter collection test

* get_process_start_time_ns update

- original implementation did not work

* Fix pytest-packages test_perfetto_data for counter collection

- erroneous failure when used with same PMC + multiple agents

* cmake policy: option() honors normal variables

- for GOTCHA submodule

* Improve samples/api_buffered_tracing stability

- reduce likelihood of sporadic exception throw

* Update gotcha submodule

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 7166b1ab58]
2025-05-18 20:11:26 -05:00
Madsen, Jonathan 9f7703f918 Build system (libdw), correlation ID, and shebang fixes (#354)
* Fix compilation for output library

- link to targets for ATT (amd-comgr, dw, elf)

* Relax correlation ID retirement log failures

- only fail for correlation ID retirement underflow when building in CI mode

* Fix shebang for several files

- license was inserted before shebang in several places

* Update code coverage exclude folders for samples

* Tweak to agent tests

- test to make sure hsa agent is not the old value instead of testing that it is the new value

* Fix libdw include/link

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 3580478426]
2025-04-27 20:16:18 -05:00
Madsen, Jonathan 50bc3ed803 [Misc] Rework header includes (#311)
* Update header file includes

* Fix includes for lib/rocprofiler-sdk/hip/hip.hpp

* Minor touch ups

* Minor include improvements

* Doxygen tweak

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: c5a3edc3fa]
2025-04-15 14:02:12 -07:00
Bhardwaj, Gopesh 8778237298 doc improvements for 1.0.0 part 2 (#330)
* update installation steps

* Github Issue #50 Adding README's for samples

* Making name change to ROCprofiler-SDK for consistency

* Fix HIP trace documentation

* Fix HSA trace in docs

* Fix kernel trace in docs

* Fixing memory copy and memory allocation traces

* runtime trace and sys trace doc update

* Fix scratch memory doc

* kernel naming and filtering options

* Adding collection period in docs

* Perfetto configs update

* summary output file

* kernel trace format fix

* update CHANGELOG

* Agent index doc update

* rocm-smi output

* group by queue option

* Updated --group-by-queue description

* perfetto visualization

---------

Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>

[ROCm/rocprofiler-sdk commit: ca7cce9e81]
2025-04-15 13:30:07 -07:00
Meserve, Mark d80c047fd2 Additional 1.0.0 changes (#317)
* Additional 1.0.0 changes

- Update VERSION
- Add beta compatibility for rocprofiler_agent_set_profile_callback_t

* Fix location of deprecated typedef rocprofiler_agent_set_profile_callback_t

* rocprofiler_record_counter_t -> rocprofiler_counter_record_t

* Experimental + deprecated annotations

* rocprofiler_record_dimension_info_t -> rocprofiler_counter_record_dimension_info_t

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: a1fcdf7f83]
2025-03-26 02:12:03 -05:00
Welton, Benjamin 692d041316 [SDK] Release 1.0 Public API Modifications (#277)
* Make sure all structs/enums can be forward declared

* Updates to counter collection

- consistency updates and cleanup

* Conversion of dimension information to info struct

* Added deprecated folder

* Testing changes

* merge changes

* Fix shadowed variable

* Source code formatting

* Fix shadowed variable

* Update rocprofiler_counter_info_v1_t member names

* Split version.h into version.h and ext_version.h

- ext_version.h contains external version info, e.g. ROCPROFILER_HSA_API_TABLE_MAJOR_VERSION, ROCPROFILER_HSA_RUNTIME_VERSION
- this reduces amount of recompilation after a commit since version.h gets updated with the git revision

* profile_config -> counter_config

* EOF new line

* [Samples] Reduce header includes + reorg counter collection samples

* Misc compilation fixes

- shadowed variables
- use of [[deprecated("...")]] in C code
- unused variables

* Minor misc modifications

- use common:: instead of rocprofiler::common:: when inside rocprofiler namespace
- counters.cpp
  - move local anon namespace functions into rocprofiler::counters:: anon namespace
  - use std::string_view for get_static_string
  - const ref for get_static_ptr
  - misc namespace shortening

* [Public API] rocprofiler_get_version_triplet + rocprofiler_version_triplet_t

- struct rocprofiler_version_triplet_t containing fields for the major, minor, and patch version
- public API function: rocprofiler_get_version_triplet
- define C++ operators for rocprofiler_version_triplet_t
- C++ function compute_version_triplet

* [Tests] Improve async-copy-testing test

- relax constraints
- improve logging

* Update counter_config.h doxygen docs

* ROCPROFILER_SDK_BETA_COMPAT

- ppdef which helps with renaming when set to 1

* Remove spurious include

* Fix includes for cxx/version.hpp

* Doxygen fixes for rocprofiler_get_version and rocprofiler_get_version_triplet

* Public API Experimental Designation

- ROCPROFILER_SDK_EXPERIMENTAL added to experimental function
- "(experimental)" added to doxygen @brief entries

* Fix use of assert instead of static_assert in hip/stream.cpp

* Use typedef instead of define for rocprofiler_profile_config_id_t

* Use inline rocprofiler_{create,destroy}_profile_config instead of ppdef

- added <rocprofiler-sdk/deprecated/profile_config.h>

* Doxygen for rocprofiler_{create,destroy}_profile_config

* ROCPROFILER_SDK_DEPRECATED_WARNINGS

* Temporarily comment out ROCPROFILER_SDK_DEPRECATED_WARNINGS=1

* cmake formatting

* Misc variable renaming in samples and tests

* Fix declarations of types

* Fix hip stream tracing service struct name

- rocprofiler_callback_tracing_stream_handle_data_t renamed to rocprofiler_callback_tracing_hip_stream_api_data_t

* Rename "HIP_STREAM_API" to "HIP_STREAM"

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>

[ROCm/rocprofiler-sdk commit: 4cd121e27b]
2025-03-24 12:07:33 +05:30
Madsen, Jonathan 36f4788ad5 [CI] Miscellaneous Testing Updates (#305)
* Add rocprofiler-sdk-utilities.cmake

- contains cmake function rocprofiler_sdk_get_gfx_architectures

* Update perfetto_reader.py

- fix hash collision

* Update project names in tests folders

- rocprofiler-tests -> rocprofiler-sdk-tests

* Fix incorrect allocation-error handling

* [CI] Disable openmp tests for navi2, navi3, and navi4

* Suppress leaks by omptarget and llvm

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 2d072f9217]
2025-03-22 18:51:42 -05:00
Indic, Vladimir 0ca07105a3 [SDK][rocprofv3] MI300 Stochastic PC sampling (#92)
* MI300 Stochastic PC sampling SDK API implementation

* ROCProfV3: Stochastic PC sampling Support (#94)

* ROCProfV3: MI300 Stochastic PC sampling initial draft

* ROCProfV3: Initial Stochastic PC sampling Tests (#95)

ROCProfV3: Initial Stochastic PC sampling tests

* Update rocprofiler_pc_sampling_record_stochastic_v0_t

- update doxygen docs for members
- replace rocprofiler_correlation_id_t with rocprofiler_async_correlation_id_t

* Relax the check in JSON tests

* drain PC sampling buffer during finalize_rocprofv3

* Increase timeout for "Test Install Build" step

- 10 minutes -> 20 minutes
- "Test Installed Packages" has 20 minutes so "Test Install Build" should also

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 49ce79a5b5]
2025-03-21 14:40:45 -05:00
Bhardwaj, Gopesh 705a2adbd3 [CI] Disable OpenMP test for sanitizers (#296)
* Disable openmp test for sanitizers

* fixing review feedback

[ROCm/rocprofiler-sdk commit: 4dcb239872]
2025-03-20 13:56:14 -05:00
Bhardwaj, Gopesh c66e315ad0 fix building OpenMP test target (#292)
* fix building openmp test target

* cmake format correction

* cmake format correction

[ROCm/rocprofiler-sdk commit: e0859f7d33]
2025-03-18 14:16:18 +05:30
Bhardwaj, Gopesh 9764f96427 removing gfx940 and gfx941 targets (#286)
* removing gfx940 and gfx941 targets

* updated changelog

[ROCm/rocprofiler-sdk commit: f5c9663c51]
2025-03-17 15:21:12 -05:00
Baraldi, Giovanni ac6e512e25 SWDEV-516846: Fix serialization services conflicts and ATT counter streaming (#230)
* Update TT API

* Rework serialization

* update att_core

* Fix tests

* Fix tool

* Formatting

* Fix perfcounter

* Formatting

* Rename agent TT

* Format

* Workaround for codeQL alert

* Tidy fix

* Fix compiler error

* Tidy

* Fix some tests

* Fixing some tests

* formatting

* Fixing ATT serialization

* Format

* Fix test commandline

* Fixing init order

* Format

* Tidy fixes

* Removing unused sample

* Fix tests and schema

* Added ATT + PMC test

* Fix mode

* Fix file mode

* Review comments

* Fix typo

* Review comments

* Review comments

* Fix missing id inc after review comment

* Review comments

* Suggested Fixes

* Testing changes

* Test fix

* Build fixes

* Minor build fix

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>

[ROCm/rocprofiler-sdk commit: 821918a512]
2025-03-14 18:11:10 -07:00
Welton, Benjamin 509298ba75 [SWDEV-518071] Return HSA not loaded status (device counter collection) (#242)
* [SWDEV-518071] Return HSA not loaded status (device counter collection)

This is a state that a caller would want to know about to understand if
they got no counters because of a failure or if they were trying to
collect counters too early (as is the case in the sample, which can
attempt to collect counters before HSA is inited).

* Minor edit

* format

* [SWDEV-518081] Simplify Metric Loading (#243)

* [SWDEV-518071] Return HSA not loaded status (device counter collection)

This is a state that a caller would want to know about to understand if
they got no counters because of a failure or if they were trying to
collect counters too early (as is the case in the sample, which can
attempt to collect counters before HSA is inited).
* [SWDEV-518324] Add AST update support

Allows the ability for ASTs to be updated (instead of an unchangable
static value). Adds a shared pointer return type to protect against
static destructors/modifications from invalidating potentially in use
AST definitions. No functionality/use changes in this PR.
* [SWDEV-518593] Add updatable dimension cache + fix string issues (#252)

* [SWDEV-518593] Add updatable dimension cache + fix string issues

Updates dimension cache to use the same design pattern as AST/Metrics.

Fixes the string scoping issue seen in ASTs, which appears here as well.

* Add rocprofiler_create_counter

Creates derived counters based on input from the API. This PR does three
things:

1. Adds the API + test case
2. Validates that an AST can be constructed from the counter supplied.
3. Updates metrics, ast, and dimension caches to include the new metric.

Metric should be available for use immediately after the call completes.

Due to the regeneration of ASTs, this call should not be performed in
performance sensitive code.

* Suggestion fixes

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>

* Minor tweak

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>

* Fixes for comments

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>

[ROCm/rocprofiler-sdk commit: 007285272b]
2025-03-14 01:07:16 -07:00
Madsen, Jonathan 17272d5df1 Re-enable OpenMP target and testing (#126)
* Re-enable OpenMP target and testing

* Enable openmp target tests on mi200 jobs

* Fix direct self-inclusion of header file

* Enable openmp-target testing on vega20

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>

[ROCm/rocprofiler-sdk commit: 2fe63d873e]
2025-03-13 22:29:07 -07:00
Madsen, Jonathan d7495f9f1a Re-enable clang-tidy for core workflows + clang-tidy fixes (#197)
* Ensure the clang-tidy is updated + clang-tidy fixes

* update-ci workflow

* Enable clang-tidy checks

* Add extra logging to device counter collection samples

* Misc clang-tidy fixes

* Disable device counter collection samples for ThreadSanitizer

* Formatting

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 070b659a9a]
2025-02-11 10:58:47 -06:00
Welton, Benjamin b90f127957 [SWDEV-513658] Force HSA_AMD_MEMORY_POOL_EXECUTABLE_FLAG value to be used with HSA calls (#192)
* Force HSA_AMD_MEMORY_POOL_EXECUTABLE_FLAG  value to be used with HSA calls

Fix for CI

* More tweaks

* Increase reproducible-runtime kernel sleep granularity

* Fix data race in synchronous device counter collection sample

* Update device counting service

- add get_active_context function

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 080b2ba451]
2025-02-10 11:34:26 -06:00
Madsen, Jonathan e677801859 Undefined behavior warnings caught by ROCPROFILER_DEFAULT_FAIL_REGEX (#23)
* Add regex for undefined behavior to ROCPROFILER_DEFAULT_FAIL_REGEX

- add UBSAN_OPTIONS to setup-sanitizer-env.sh

* Improve ROCPROFILER_DEFAULT_FAIL_REGEX

* Use -fno-sanitize-recover=undefined flag

- this compiler flag causes all undefined behavior errors to exit

* Revert ROCPROFILER_DEFAULT_FAIL_REGEX

* fix for shift overflow

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Manjunath-Jakaraddi <manjunath.jakaraddi@amd.com>

[ROCm/rocprofiler-sdk commit: e743bf5a93]
2025-02-06 08:55:57 -06:00
Welton, Benjamin 6bd6bb1aec Add example for synchronous reading of device counters (#64)
* Add example for synchronous reading of device counters

We already have test cases for this use case but this a sample
such that our collaborators can have a place to quickly pull
code from for use on their end (and to serve as a working example).

* Formatting fix

* Formatting fix

* Minor change for testing

---------

Co-authored-by: Benjamin Welton <ben@amd.com>

[ROCm/rocprofiler-sdk commit: 6c396adf83]
2025-02-06 08:35:55 -06:00
Indic, Vladimir 9f7772d90b Temporarily allow only host-trap sampling (#156)
[ROCm/rocprofiler-sdk commit: e4d736839d]
2025-01-27 13:26:11 -06:00
Rawat, Swati edb51fc861 update copyright date to 2025 (#102)
* Update LICENSE

* Update conf.py

* Update copyright year

* [fix] Update copyright year

* Update copyright year "ROCm Developer Tools"

* Add license headers to c++ files

* Add license to *.py

* Update licenses in rocdecode sources

---------

Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Mythreya <mythreya.kuricheti@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 97b7a6315d]
2025-01-22 19:11:20 -06:00
Baraldi, Giovanni f8442415f8 SWDEV-492607: Adding ATT wrapper (#40)
* Adding att parser wrapper

* Adding ATT tests as optional

* Adding decoder API for query capability

* Removed samples

* Formatting

* adding new line

* Removed perfetto and moved to static library

* using default search for lib

* Updated to SDK

* Namespace changes

* Added tests

* Small refactor

* Updated API to receive agent_id

* Fixing tests

* Tidy fixes

* Not write to file

* Switch to filesystem.hpp

* Compilation fixes

* Formatting

* Tidy fix

* Removed likely

* Adding tests

* Added gfx9 test

* Adding gfx12 tests

* Formatting

* Enable tidy

* Fix tests

* Fix deadlock on agent test

* Workaround ASAN

* Moving query outside class.

* Fix standalone tool

* Addressing comments

* Formatting

* Change query name

* Fixed some tests. Updated PR comments.

* Formatting

* Improved coverage

* Formatting

* Fix for comments

* Formatting

* Adding some description. Fix error type.

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>

[ROCm/rocprofiler-sdk commit: 2c8e88a76b]
2024-12-18 18:53:32 -08:00
Trowbridge, Ian b2aa29eb0a Skip Building of OpenMP Samples and Tests (#77)
* Add option to disable openmp samples

* Skip building openmp tests and samples for now

[ROCm/rocprofiler-sdk commit: 9de284a568]
2024-12-18 11:37:51 -06:00
Baraldi, Giovanni 661b608227 SWDEV-489158: Fix for exit thread safety (#61)
* SWDEV-489158: Fix for exit thread safety

* Fixed exit thread logic

* Force CI to rerun

* Remove .vscode

* Fix thread safety bug

* Addressed some comments

* Formatting

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>

[ROCm/rocprofiler-sdk commit: f4984f9dcc]
2024-12-11 12:41:19 -06:00
Welton, Benjamin 1850de7ee1 [AFAR VII] rocprofiler_sample_device_counting_service return data as part of API call (#57)
---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Benjamin Welton <ben@amd.com>

[ROCm/rocprofiler-sdk commit: 253c9adfc1]
2024-12-06 22:37:45 -08:00
Indic, Vladimir 00b558c037 PC Sampling API: emit info logs instead of error (#53)
* PC Sampling API: emit info logs instead of error

Inside PC sampling API, emit info logs instead of
error logs. The tests verifies status code of each
API call and decide when to skip, instead of relying
on messages in logs.

The samples_processing.cpp test has been removed as it's
not used.

[ROCm/rocprofiler-sdk commit: b4d7ee7887]
2024-12-06 20:40:30 +01:00
Madsen, Jonathan a79f8a0198 SDK: OMPT Support (#22)
* Ability to select alternative compiler per file

Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.

Misc updates

Update OpenMP target sample

- samples/ompt -> samples/openmp_target
- fix sample test of openmp-target
- reorganize files

Rework OpenMP implementation

Minor OpenMP implementation cleanup

Rename samples/openmp_target CMake targets

Add tests/bin/openmp

- OpenMP target test app in tests/bin/openmp/target

Format samples/openmp_target CMakeLists.txt

Misc lib/rocprofiler-sdk/openmp cleanup

- fix includes
- convert_arg

Update openmp.def.cpp

- tweak includes
- remove lots of temporary variables

Update samples

- common::get_callback_id_names() -> common::get_callback_tracing_names()
- add kernel dispatch, memory copy, scratch memory buffered tracing to openmp target sample

Fix code object operation names

- add "CODE_OBJECT_" prefix

Update include/rocprofiler-sdk/openmp/api_id.h

- remove spurious comment

Miscellaneous openmp updates

- similar API for openmp_begin and openmp_end
- move implementations of ompt callbacks to openmp.cpp
- ompt_{thread_begin,thread_end,parallel_begin,parallel_end}_callbacks are openmp_events

[SWDEV-484495] Fix int truncation in CSV output (#1098)

CSV output truncates doubles to ints when it shouldn't. Derived metrics
are (mostly) doubles and lose precision (or become worthless) if treated
as an int. Converted these to double to match the format we return from
rocprof-sdk.

Co-authored-by: Benjamin Welton <ben@amd.com>

Update limit for max counter records in rocprof-tool (#1073)

A fixed sized std::array is used to store counter records in rocprofiler SDK. This limit was breached in SWDEV-484742. Upping the limit to 512 to be less likely to reach this limit again.

adding proxy ompt_data_t * arguments

fixes for proxy pointers

- Implement proxy ompt_data_t* pointers for clients
- Add ompt_data_t* arguments back to callback API
- Modify openmp sample to illustrate use of proxy pointers

formatting

SWDEV-467350: Skipping tool counter iteration for unsupported hardware (#1083)

Fixing some accumulate metrics (#1089)

* Fixing some accumulate metrics

* Fixing some more accumulate metrics

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>

updating rocprofv3 help options (#1113)

* updating rocprofv3 help options

* updating CHANGELOG

Fixing installed pacakge tests in CI (#1119)

* Fixing installed pacakge tests in CI

* Formatted rocprofv3.py with black formatter

SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests. (#1112)

* SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests.

* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Adding backlog for codeobj changes

* Formatting

* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

---------

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

SWDEV-487621: Fixes for metric definitions (#1118)

* Fixes for metric definitions

* Removing gfx8

* Update changelog

* Fixing unit tests

* Small fixes

* Fix for write size

Fix PSDB change (#1120)

Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit c77e4d3b80

clang-18 build fix for RCCL (#1123)

Removes ambiguity on const usage, which clang-18 complains about
(preventing build with warn error).

mem copy direction field update (#1124)

Adding Node-id for debugging with log level trace (#1090)

fix botched rebase

Per Jonathan to remove -rdynamic warning so CI will continue

pedantic formatting

Correct the package name of rocprofiler-sdk (#1126)

* Correct the package name of rocprofiler-sdk

ROCM VERSION(for ex: 60300) was missing in the package name.
Added the same

* Use cmake cache string while setting the variable for ROCm Version

* correct the cmake-format

---------

Co-authored-by: Ranjith Ramakrishnan <Ranjith.Ramakrishnan@amd.com>

Fixing kokkosp tool library packaging (#1121)

* Fixing kokkosp tool library packaging

* Update source/lib/rocprofiler-sdk-tool/kokkosp/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update CMakeLists.txt

* Update CMakeLists.txt

* Component Requirement in CPack

* Adding package dependency

* Update CMakeLists.txt

* Update rocprofiler_config_packaging.cmake

* Fix rocprofiler-sdk-tool-kokkosp BUILD/INSTALL RPATH

- CMAKE_INSTALL_LIBDIR doesn't help

* Add BUILD/INSTALL RPATH to rocprofv3-trigger-list-metrics

- fixes packaging issues

* Update packaging

- core depends on rocprofiler-sdk-roctx
- add CPACK_DEBIAN_PACKAGE_SHLIBDEPS_PRIVATE_DIRS to resolve inter-package dependencies

* Fix package depends version format

* Improve tests/rocprofv3/summary/validate logging

* Update CI workflow

- prioritize roctx package in Install Packages step

* Remove setting <package-name>_VERSION in config.cmake.in

- this is automatically handled by existence of <package-name>-config-version.cmake

* Update rocprofiler-sdk-config.cmake

- relax find_package versioning requirements to same major and minor version

* Update rocprofiler-sdk-config.cmake

- relax find_package versioning requirements (remove EXACT, specify range)

* Tweak CI workflow

* Update perfetto_reader.py

- better handle failure to load trace processor

* Misc cleanup for config packaging

* Update config packaging

* Update config packaging

* Revert perfetto for core-rpm packages

* Revert perfetto for core-rpm packages

- perfetto < 0.9.0

* Tweak tests/rocprofv3/summary/validate.py

- reorder some checks

---------

Co-authored-by: Ammar Elwazir <aelwazir@useocpm2m-387-013.amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

Clang Warning Fixes (#1131)

Builds prevented on clang-18

Adding start and end timestamp columns in csv (#1128)

* Adding start and end timestamp columns in csv

* Adding assert check for the counter timestamps

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>

rocprofv3: docs and help menu updates (#1129)

* doc updates

* Correcting ROCtx information

* Making ROCTx string consistent

* missing occurence

Renamed agent profiling service to device counting service (#1132)

* Renamed agent profiling service to device counting service

Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:

find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +

* Converted dispatch profile to dispatch counting service

* Debug for functioal counters test

* Minor changes for CI

* Minor fix

* More fixes for CI

* Update evaluate_ast.cpp

---------

Co-authored-by: Benjamin Welton <ben@amd.com>

Testing updated RPM dockers (#1136)

* Testing updated RPM dockers

* Trying to fix PSDB for test package dependency

Agent Profiling Fixes for Broken/Improper API Usage (#1122)

Prevent's multiple setups of agent profiling on the same agent.

Fixes agent read context to only read agents that were setup.

Prevent copy of agent profiling internal data struct and reset
hsa_signal on move to prevent inadvertant delete.

Simplifying PR template (#1139)

Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.

Fixing installed pacakge tests in CI (#1119)

* Fixing installed pacakge tests in CI

* Formatted rocprofv3.py with black formatter

Fix PSDB change (#1120)

Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit c77e4d3b80

delete unused files

added arguments to some OMPT buffter records

* Fix cmake issues

Remove rocprofiler_ompt_finalize_tool

- a public API function is not necessary: should just finalize rocprofiler-sdk

Fix duplicate ROCPROFILER_{BUFFER,CALLBACK}_TRACING_KIND_STRING

Add lib/rocprofiler-sdk/ompt.hpp

- declares rocprofiler::sdk::finalize_ompt

Remove change to tests/rocprofv3/summary/conftest.py

Add set_fini_status(1) back to registration.cpp

Deleted uneeded files

Incoporate OpenMP code and sample

Fix merge issues with amd-staging

Add push_correlation_id for OpenMP tasking; improve debugability

fixup bad merge

* Suppress OpenMP data race

* Fix openmp_target sample

* Enum and struct name changes + source code reorg

- remove mix of ompt and openmp
  - opted for ompt
- changes made for consistency
  - ompt_api -> ompt
  - openmp_api -> ompt
  - OPENMP -> OMPT

* Update tests and more renaming

- dest_device_num -> dst_device_num
- src_addr -> src_address
- dest_addr -> dst_address
- remove info_type::begin
- require OMP_TARGET_OFFLOAD

* Update openmp-target test/sample env and labels

* Formatting

* Tweaks to cmake for openmp target

- Disable for thread sanitizers due to preloading issue

* OpenMP target cmake updates

- remove gfx1010 (fails on mi300)
- OPENMP_GPU_TARGETS

* Remove device_unload and target_map_emi support

- these are never supported by AMD OpenMP compilers

* Update CI workflow

- exclude openmp-target tests from navi3 and vega20

---------

Co-authored-by: Larry Meadows <Lawrence.Meadows@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 00c46fd5e5]
2024-12-05 22:48:19 -06:00
Meserve, Mark f6c923e191 SWDEV-445864: SWDEV-445865: Update page migration events (#16)
* Update kfd ioctl header

- Adds new event for dropped events
- Mirrors kernel update by Philip Yang

* Add error code for page migration events

- Adds support for new error code field for page migration end events
  - Page migration end event is now generated for migration failure
  - Error code is zero for successful migration

* Add dropped event SMI event

- New event type indicates if events were dropped
  - Events are dropped if the buffer is full

[ROCm/rocprofiler-sdk commit: fc2513888f]
2024-12-05 20:44:10 +00:00
Vladimir Indic e11b553a26 PC sampling services provides dispatch id (#1209)
[ROCm/rocprofiler-sdk commit: 8d2ce4b475]
2024-11-21 11:10:31 -06:00
Vladimir Indic 42c6ffc0eb Host trap PC sampling uses new record type (#1207)
* Host trap PC sampling uses new record type

* removing redundant field

* formatting

* simplifying templates in the parser - no need for HostTrap boolean

* reviving some parser tests

* hw_id decoding on GFX9

* HW id parser test

* parser CID test

* Parser multigpu test

* removing rocprofiler_pc_sampling_record_t and some fields from hw_id

* simplifying parser context

* keep bench test internally

* initializing gfx9_hw_id_t differently

* anonymous struct first

* avoiding inlining initialization of struct

[ROCm/rocprofiler-sdk commit: bc52c17e64]
2024-11-20 14:02:47 -06:00
itrowbri 343a842c81 SWDEV-496634: Revert deprecation of hipHostMalloc and hipHostFree functions (#1186)
[ROCm/rocprofiler-sdk commit: 0356446654]
2024-11-11 11:47:31 -06:00
Mythreya 36d357337d Report page migration events as start/end (#793)
* Squashed commit of the following:

commit b76f2635f4b65599f03812a73d0cf410f5ada213
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Fri Apr 26 00:29:09 2024 +0000

    Changed for PR feedback

commit bedb8ad566ff42fbf117b19202c26c507abcf8ac
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date:   Thu Apr 25 19:20:06 2024 -0500

    Fix installation

commit a98f8a69459a1450a1be9c98e20b3c1e7f2568c2
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date:   Thu Apr 25 19:16:35 2024 -0500

    Restructure the headers

commit 46489a020ffafdd5f4ce3f580469ff233ef67fe1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 23 23:31:10 2024 +0000

    Update hsa include

commit 8e795282cce348fc6aa736b7857b21aeb32aa20a
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 23 23:02:32 2024 +0000

    Report page migration events as start/end

    * Updated tests accordingly
    * Page migration events are reported independently

commit 8784e5ad4895a626a2a8e4ac12f8021b34172bd4
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 16 17:01:57 2024 +0000

    Update handling of dropped page migration events

    Previously, we dropped all locally buffered events when we detect that
    KFD has dropped some events. This may drop too many pending events too eagerly.

    When we receive an end event and cannot find the corresponding start,
    we can be sure that KFD has dropped some events in the immediate past.

    When this happens, we look through all locally buffered events and report
    the start events that are older than 10s as partial events --- they have
    no "end" information (we expect that the end events have been dropped).

    We also set the polling timeout to 10s to prevent the local buffer from
    getting too large with events waiting to be paired up.

    Updated tests

commit 2e8e0b07eeda9b5990e1ae8d28dcd3a035ce38e1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 16 17:01:31 2024 +0000

    Docs for triggers

* Fix page migration sample

* Fix hasher, kfd install

* Add hsa include
* Install KFD include dir

* Updates from code review

- single timestamp field
- node_id -> agent_id
- from_node -> from_agent
- to_node -> to_agent

* Misc revisions

* Remove page-migration install target

* Update page-migration pytest

* Tweak to serialization

* Address PR comments

* Update page-migration test

* Add cli args, update iterations

* Address PR comments

* Add abi.cpp for static_asserts
* Update page_migration gtest with only runtime tests
* Moved helpers into utils.hpp

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 363f85dc72]
2024-11-11 11:08:47 -06:00
Jonathan R. Madsen c17952fd23 rocprofv3: refactor and reorganize rocprofiler-sdk-tool library (#1138)
* Add rocprofv3-multi-node.md to source/lib/rocprofiler-sdk-tool

* Initial source re-organization

- create "output" static library

* Update include/rocprofiler-sdk/cxx/serialization.hpp

- add GPR count fields to kernel symbol serialization

* Add source/scripts/generate-rocpd.py

- reads one or more JSON output files from rocprofv3 and writes rocpd SQLite3 database
- Note: preliminary implementation

* More reorganization b/t lib/rocprofiler-sdk-tool and lib/output

* Updates to generate-rocpd.py

- add SQL views
- option: --absolute-timestamps -> --normalize-timestamps
- option: --generic-markers
- misc fixes with regards to getting the views working
- support marker names

* Update generate-rocpd.py

- Add --marker-mode option

* Update generate-rocpd.py

- Improve debugging of bad bulk SQLite statements

* Update rocprofv3-multi-node.md

- cleanup of proposed SQL schema

* lib/output/format_path.{hpp,cpp}

- rename format to format_path (in config.hpp and config.cpp)
- move format_path functionality to format_path.{hpp,cpp}

* Rework lib/output/tmp_file_buffer.{hpp,cpp}

* Update output_key.cpp

- support %cwd%, %launch_date%

* Rework lib/output/buffered_output.hpp

* Support csv_output_file constructed via domain_type

* Update lib/output/domain_type.{hpp,cpp}

- get_domain_trace_file_name
- get_domain_stats_file_name

* Update lib/rocprofiler-sdk-tool/tool.cpp

- tweak headers

* Update lib/output/generate*.cpp

- remove include of helpers.hpp
- CSV uses domain_type for filenames

* Update samples/counter_collection/per_dev_serialization.cpp

- make wait_on volatile

* Remove tool_table from lib/output and lib/rocprofiler-sdk-tool

- Also split various structs into their own files
  - lib/output/agent_info
  - lib/output/metadata
  - lib/output/kernel_symbol_info
  - lib/output/counter_info
- Implemented rocprofiler::tool::metadata

* Optimize rocprofiler_tool_counter_collection_record_t

- reduce the size of the struct from 24784 bytes to 8376 bytes

* Introduced output_config

- split subset of config (from tools library) into output_config to be able to configure the output generating functions separately from the tool library
- this is a significant step towards the output generating functions not relying on static global memory

* Stream chunks of data into output instead of loading all info memory

* Remove duplicate group_segment_size in rocprofiler_kernel_dispatch_info_t serialization

* Adding Q&A to rocprofv3-multi-node.md

* Remove all remaining include lib/rocprofiler-sdk-tool from lib/output

- migrated a fair amount of code from lib/rocprofiler-sdk-tool/helper.hpp to lib/output

* Update Q&A of rocprofv3-multi-node.md

* Fix minor compilation errors + minor cleanup

* Update hsa/async_copy.cpp

- when ROCPROFILER_CI_STRICT_TIMESTAMPS > 0, reduce the active_signal sync wait time

* Update profiling_time.hpp

- fix log messages for when start/end time is less/greater than enqueue/current CPU time

* Fix generate_stats for tool_counter_record_t

* Dictionary optimization for generate-rocpd.py

---------

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

[ROCm/rocprofiler-sdk commit: 5eb8c2658c]
2024-11-07 01:15:19 -06:00
Benjamin Welton 48fe6573a4 SDK: counter collection serialization per device (#1157)
Migrates profiler_serializer class in QueueController to have an instance per-agent instead of one globally. Other changes in this commit are to allow for maps of the queues associated with each agent to be passed to profiler_serializer when it is turned on/off. Existing test cases cover whether or not the kernels are serialized (multistream app). New test case added to show that this serialization only occurs on a per device level with a kernel launched on one device waiting for a value to be set on the other.

[ROCm/rocprofiler-sdk commit: 4a5b1d98c2]
2024-10-25 13:13:36 -07:00