58 İşleme

Yazar SHA1 Mesaj Tarih
Benjamin Welton 1517a398bf [rocprofiler-sdk] Buffer finalization fixes and HSA ABI 0x09 support (#2318)
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements

Buffer Pool Design
------------------
Replace the fixed array-based double buffer with a dynamic pool design to
fix race conditions that caused "internal correlation id was retired
prematurely" errors.

The original design had a race where flush callbacks could be delivered
out-of-order: when buffer 0 fills and begins flushing, writes go to
buffer 1. If buffer 1 fills before buffer 0's flush completes, the
buffer index wraps back to 0 (which may still be flushing). Independent
flush tasks submitted to the thread pool can complete out of order.

The new pool design:
- Uses a std::deque of buffer instances that grows as needed
- Allocates buffers from the pool when the current buffer needs to flush
- Serializes flushes with a mutex to ensure FIFO callback ordering
- Returns buffers to the pool after flush completion
- Eliminates the race between buffer selection and write operations

New Unit Tests
--------------
- buffer_correlation_ordering.cpp: Tests that API records are always
  delivered before their corresponding retirement records
- buffer_ordering_stress.cpp: Stress tests buffer flush ordering under
  high contention with multiple threads rapidly filling buffers

HSA Tool Hooks
--------------
Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that
waits for pending flush tasks before tool finalization, preventing
"retired prematurely" errors during HSA shutdown.

Sanitizer Improvements
----------------------
- LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder
- LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup)
- TSAN: Added suppression for false positive on C++11 thread-safe static
  initialization in create_write_functor
- ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto
- Disabled attachment tests for sanitizers due to library preloading issues

Other Fixes
-----------
- Thread-trace agent test: Use heap-allocated callback state
- Correlation ID: Refactored reference counting and finalization ordering

* [rocprofiler-sdk] Revert buffer pool design changes

Revert buffer.cpp and buffer.hpp to the original double-buffer
design from develop branch. The pool-based redesign introduced
concerns about:
- Signal safety (mutex vs atomic_flag)
- API changes (flush() return type)
- Complexity of the new design

This revert removes:
- Dynamic buffer pool with std::deque
- std::mutex/condition_variable synchronization
- buffer_correlation_ordering.cpp test
- buffer_ordering_stress.cpp test

The underlying buffer flush ordering issue will need to be
addressed with a different approach that preserves the original
API and synchronization characteristics.

* [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization

- Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks
- Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning
- Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp)
- Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior:
  - hsa/queue.cpp (lines 105, 210)
  - hsa/async_copy.cpp (line 344)
  - hsa/hsa_barrier.cpp (line 43)
  - buffer.cpp (lines 107, 138, 185)

This ensures no correlation IDs are created once finalization starts (fini_status != 0),
preventing races between finalization and ongoing tracing operations.

* [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation

Buffer records are not guaranteed to arrive in any specific order. Tests and
samples should use timestamps for temporal ordering validation instead.

Changes:
- samples/external_correlation_id_request: Replace 'retired prematurely' arrival
  order check with timestamp-based validation that retirement timestamp >=
  max(end_timestamps) for records with the same correlation ID
- tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check

Correlation IDs are not guaranteed to be monotonically increasing when records
are sorted by timestamp. Temporal ordering should be validated using the
timestamp fields in each record.

* [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal

Restore the SYSTEM keyword to target_include_directories for
rocprofiler-sdk-fmt to match develop branch.

* [rccl] Remove orphaned rocSHMEM gitlink

Remove orphaned submodule reference that was introduced during a merge
but never had a corresponding .gitmodules entry, causing CI failures
with "fatal: no submodule mapping found in .gitmodules".

* [rocprofiler-sdk] Add HSA ABI version 0x09 support

Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which
introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release
functions (added in rocr-runtime SWDEV-561708).

* [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations

This commit consolidates fixes for handling the finalization status during
buffer flush operations across the SDK.

Changes:
- Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully
  when flushing buffers, as this indicates buffers were already flushed
  during finalization (not an error condition)
- HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check
  for fini_status to allow operations during finalization process
- buffer.cpp: Revert fini_status checks to use > 0 for consistency
- correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging
  to prevent correlation ID creation after finalization starts

Files modified:
- source/lib/rocprofiler-sdk-tool/tool.cpp
- tests/tools/json-tool.cpp
- source/lib/rocprofiler-sdk/tests/registration.cpp
- source/lib/rocprofiler-sdk/tests/roctx.cpp
- samples/api_buffered_tracing/client.cpp
- samples/counter_collection/buffered_client.cpp
- samples/counter_collection/device_counting_async_client.cpp
- samples/external_correlation_id_request/client.cpp
- samples/pc_sampling/client.cpp
- source/lib/rocprofiler-sdk/buffer.cpp
- source/lib/rocprofiler-sdk/context/correlation_id.cpp
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- source/lib/rocprofiler-sdk/hsa/async_copy.cpp
- source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp

* [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling

Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls
in samples and tools. The ERROR_FINALIZED handling was overly complex
and the hsa_tool_hooks OnUnload synchronization is no longer needed.

Changes:
- Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code
- Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL
- Simplify buffer flush in tool.cpp and json-tool.cpp
- Remove ERROR_FINALIZED special handling from test files

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Fix output_stream move semantics to null source pointers

The default move constructor and move assignment operator for
output_stream did not null out the source's pointers after the move.
This caused double-close when the moved-from temporary was destroyed,
leading to use-after-free crashes (SIGSEGV in std::ostream::sentry).

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration

- generatePerfetto.cpp: Move output_stream into shared_state to prevent
  use-after-free race conditions during Perfetto callback execution
- run-ci.py: Simplify and consolidate sanitizer environment variable
  configuration for better maintainability

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions

The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required
for CTest to properly pass suppression files to the sanitizers during
memcheck runs.

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert "[rccl] Remove orphaned rocSHMEM gitlink"

This reverts commit 1ad21003941355658fff8114fa27768f11a948f7.

* [rocprofiler-sdk] Revert registration.cpp changes

Revert changes to registration.cpp to match develop branch.

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Remove suppression file content printing from run-ci.py

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix output_stream move ctor/assignment operator

* Fix erroneous revert of registration.cpp

* Fix handling of fini status in correlation ID construction

* [rocprofiler-sdk] Fix OMPT segfault during finalization

Add nullptr checks in OMPT tracing code to handle the case where
correlation_tracing_service::construct() returns nullptr during
finalization. This fixes segfaults in openmp-target-sample and
tests.integration.execute.openmp-tools.

The correlation ID construction now returns nullptr when fini_status > 0,
but the OMPT callbacks were not checking for this, causing crashes when
dereferencing the null pointer during OpenMP runtime shutdown.

Changes:
- event_common(): Return nullptr early if correlation ID is null
- event(): Check for nullptr before calling sub_ref_count()
- ompt_task_create_callback(): Return early if correlation ID is null
- ompt_task_schedule_callback(): Return early if correlation ID is null

* [rocprofiler-sdk] Fix HSA API tracing segfault during finalization

Add nullptr check in hsa_api_impl::functor after correlation ID
construction. During finalization, correlation_service::construct()
returns nullptr, and without this check the code would dereference
the null pointer when accessing corr_id->internal.

This fixes the SEGV at address 0x000000000008 (null + 8 byte offset)
that occurs when HSA async event threads call hsa_signal_destroy
during runtime shutdown after finalization has started.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-27 13:27:54 -05:00
Jonathan R. Madsen a2288eb50b [rocprofiler-sdk] Install unit tests and helper functions for integration tests (#921)
* [rocprofiler-sdk] Install unit tests and helper functions for integration tests

* Fix rocprofiler-sdk-tests-target export

* Fix handling of cmake policy CMP0174

* Remove -vv from new pytest.ini files

* add unit tests and integration tests.

* add path to ci workflow.

* misc. fixes.

* pc sampling tests.

* bug fixes.

* pc sampling tests fix.

* misc.

* Update CMakeLists.txt

* Update rocprofiler_config_install_tests.cmake, correct license name

* fix units tests install issues.

* fix counters_def file path.

* fix bug, arg shifting.

* vendor pytest-cmake.

* cmake config fix. missing endfunction()

* disable tests, 1.rocprofv3-trace-hip-libs. 2.kernel-tracing. 3.external_correlation 4.rocpd.

* disable buffered-tracing test and remove pytest-cmake from requirements.txt.

* disable hip-graph-tracing test.

* fix building standalone tests to load rocprofiler-sdk cmake package first and then find rocprofiler_sdk_pytest module.

* addressed comments: 1.add local bin path to code cov workflow. 2.add to cmake prefix path local bin. 3.use ROCPROFILER_MEMCHECK_PRELOAD_ENV_VALUE 4.misc. fix

* enabled back tests api_buffered, external_correlation_id, hip-graph, kernel-tracing, rocpd, tracing-hip-in-libraries. and misc fixes(formating, extra fixtures for agent-index tests.)

* cpack to use llvm bin for .hsaco debug symbols.

* psdb tests fixes.

* EOL.

* misc. fixes and Disable api_buffered_tracing, external_correlation_id, hip-graph-tracing, kernel-tracing, rocpd, summary, tracing-hip-libraries, tracing-plus-counter-collection.

* fix incorrect cmakelists file.

* strip smallkernel.bin

* format.

* revert disabled tests commit.

* misc. fix in counter tests.

* misc.

* search codeobj unit test assets in curr bin and install bin.

* refactor newly added rocpd tests.

* modify tests for newly added hip-host-tracing.

* add LD LIB path to units, psdb is failing due to libs not being found.

---------

Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
Co-authored-by: Venkateshwar Reddy Kandula <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-21 08:06:56 -06:00
Welton, Benjamin fbf17a42d4 [SWDEV-516561][1/2] Add MARKER_RANGE_EXTENT to capture ROCTX ranges (#363)
* [SWDEV-516561][1/2] Add MARKER_RANGE_EXTENT to capture ROCTX ranges

Range extent to capture all work between roctxpush/pop operations. Entry callback takes place during roxtxpush and exit callback takes place in roctxpop. This is primarily to allow us to keep an ancestor id on the ancestor stack such that all operations that take place within the push/pop context can be annotated as being apart of this range. With the current setup (where push and pop are two separate operations that need to be combined externally), we cannot keep an ancestor id on the stack and thus cannot tie tracing events to particular ranges.

Correlation id information is inherited from the push operation. Ancestor id needs to be added in a future commit that also outputs this ancestor to CSV.

Output:

```
[ctest] {'size': 64, 'kind': 7, 'operation': 1, 'correlation_id': {'internal': 1525, 'external': 0, 'ancestor': 1524}, 'start_timestamp': 2932551479402642, 'end_timestamp': 2932551491178449, 'thread_id': 3254861}
[ctest] {'size': 64, 'kind': 8, 'operation': 2, 'correlation_id': {'internal': 1525, 'external': 0, 'ancestor': 1524}, 'start_timestamp': 2932551479405878, 'end_timestamp': 2932551491181214, 'thread_id': 3254861}
```

Note: Kind 8 = range extent op.

* Merge fix

Revert several changes

source/lib/rocprofiler-sdk/marker/range_marker.*

- separate out range marker implementation for standard marker implementation

Update public API with marker core range

Support marker core range in sdk (source/lib/rocprofiler-sdk)

Transition rocprofiler-sdk-tool and output lib to use marker core range

Misc fixes for tests

Fix logic in lib/output/generate{CSV,Stats}.cpp

Update tests/rocprofv3/tracing-hip-in-libraries (marker validation)

Fix test_otf2_data

* Test fixes

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>

[ROCm/rocprofiler-sdk commit: 2c4e20b951]
2025-07-08 23:41:22 -07:00
Kuricheti, Mythreya bde07e7baa [SDK] KFD new events API (#321)
* Remove page-migration

* Add KFD events API

* Address review comments

* Move assert checks

* Update enum-string utils

* Update codeowners

* Update KFD header

* Add perfetto category

[ROCm/rocprofiler-sdk commit: 8a461afe20]
2025-06-26 13:28:45 -05:00
Madsen, Jonathan b097e276a9 [rocprofv3] Add rocpd output support (part 1: prelude) (#401)
* [rocprofv3] Add rocpd output support (part 1: prelude)

- git submodules for sqlite3, GOTCHA, and pybind11
- HIP stream data
- rocprofiler_query_intercept_table_name(...)
- serialization load
- rocprofiler::sdk::get_perfetto_category(KindT)
- rocprofiler::sdk::parse::strip
- common library updates
  - md5sum
  - hasher
  - simple_timer
  - static_tl_object
  - get_process_start_time_ns(pid_t)
- output library updates
  - node_info
  - file_generator (generator is now virtual base class)
  - stream info updates

* Added submodules

* Code review updates

* Minor unused-but-set-X warning fixes

* Update CI

- install libsqlite3-dev package

* Update CI

- install libsqlite3-dev package

* Fix static thread-local object memory leak

- also fix signal handler chaining

* Remove URL from comment

* Remove page migration exception

* Enable ROCPROFILER_BUILD_SQLITE3 by default

- try find_package(SQLite3) first and then build when ROCPROFILER_BUILD_SQLITE3=ON

* Fix gotcha installation

- make install of target optional

* Validate tracing + counter collection dispatch data

- i.e. correlation ids, thread ids, timestamps

* Make find_package(SQLite3) optional

- ROCm CI does not have SQLite3 dev package installed and cannot build from source (missing tclsh)

* Fixes to tracing + counter collection test

* get_process_start_time_ns update

- original implementation did not work

* Fix pytest-packages test_perfetto_data for counter collection

- erroneous failure when used with same PMC + multiple agents

* cmake policy: option() honors normal variables

- for GOTCHA submodule

* Improve samples/api_buffered_tracing stability

- reduce likelihood of sporadic exception throw

* Update gotcha submodule

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 7166b1ab58]
2025-05-18 20:11:26 -05:00
Kuricheti, Mythreya cd106cda3c [CI]: test-async-copy-tracing-validate : test_ancestor_ids (#364)
* Add ancestor ID to pftrace

* Simplify hipMemcpyAsync ancestor ID checks

* Review comments

[ROCm/rocprofiler-sdk commit: cac3ac1b98]
2025-04-25 11:19:17 -05:00
Trowbridge, Ian 5467c83188 rocDecode Buffer Tracing Support (#315)
* Added buffer tracing support for rocdecode and updated tests to work with buffer tracing

* Updated perfetto to output args individually rather than as a string list

* Updated docstrings and operation type, changed OTF2 code to remove warning due to change in operation type

* Updated tests for review comments

* Test args exist and return value

* Updated to use string entry

* Change function name

* Updated PR to reflect review comments

* Updated for PR review comments

* Change function name

[ROCm/rocprofiler-sdk commit: 077723337a]
2025-04-11 21:56:36 +00:00
Welton, Benjamin 692d041316 [SDK] Release 1.0 Public API Modifications (#277)
* Make sure all structs/enums can be forward declared

* Updates to counter collection

- consistency updates and cleanup

* Conversion of dimension information to info struct

* Added deprecated folder

* Testing changes

* merge changes

* Fix shadowed variable

* Source code formatting

* Fix shadowed variable

* Update rocprofiler_counter_info_v1_t member names

* Split version.h into version.h and ext_version.h

- ext_version.h contains external version info, e.g. ROCPROFILER_HSA_API_TABLE_MAJOR_VERSION, ROCPROFILER_HSA_RUNTIME_VERSION
- this reduces amount of recompilation after a commit since version.h gets updated with the git revision

* profile_config -> counter_config

* EOF new line

* [Samples] Reduce header includes + reorg counter collection samples

* Misc compilation fixes

- shadowed variables
- use of [[deprecated("...")]] in C code
- unused variables

* Minor misc modifications

- use common:: instead of rocprofiler::common:: when inside rocprofiler namespace
- counters.cpp
  - move local anon namespace functions into rocprofiler::counters:: anon namespace
  - use std::string_view for get_static_string
  - const ref for get_static_ptr
  - misc namespace shortening

* [Public API] rocprofiler_get_version_triplet + rocprofiler_version_triplet_t

- struct rocprofiler_version_triplet_t containing fields for the major, minor, and patch version
- public API function: rocprofiler_get_version_triplet
- define C++ operators for rocprofiler_version_triplet_t
- C++ function compute_version_triplet

* [Tests] Improve async-copy-testing test

- relax constraints
- improve logging

* Update counter_config.h doxygen docs

* ROCPROFILER_SDK_BETA_COMPAT

- ppdef which helps with renaming when set to 1

* Remove spurious include

* Fix includes for cxx/version.hpp

* Doxygen fixes for rocprofiler_get_version and rocprofiler_get_version_triplet

* Public API Experimental Designation

- ROCPROFILER_SDK_EXPERIMENTAL added to experimental function
- "(experimental)" added to doxygen @brief entries

* Fix use of assert instead of static_assert in hip/stream.cpp

* Use typedef instead of define for rocprofiler_profile_config_id_t

* Use inline rocprofiler_{create,destroy}_profile_config instead of ppdef

- added <rocprofiler-sdk/deprecated/profile_config.h>

* Doxygen for rocprofiler_{create,destroy}_profile_config

* ROCPROFILER_SDK_DEPRECATED_WARNINGS

* Temporarily comment out ROCPROFILER_SDK_DEPRECATED_WARNINGS=1

* cmake formatting

* Misc variable renaming in samples and tests

* Fix declarations of types

* Fix hip stream tracing service struct name

- rocprofiler_callback_tracing_stream_handle_data_t renamed to rocprofiler_callback_tracing_hip_stream_api_data_t

* Rename "HIP_STREAM_API" to "HIP_STREAM"

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>

[ROCm/rocprofiler-sdk commit: 4cd121e27b]
2025-03-24 12:07:33 +05:30
Madsen, Jonathan 36f4788ad5 [CI] Miscellaneous Testing Updates (#305)
* Add rocprofiler-sdk-utilities.cmake

- contains cmake function rocprofiler_sdk_get_gfx_architectures

* Update perfetto_reader.py

- fix hash collision

* Update project names in tests folders

- rocprofiler-tests -> rocprofiler-sdk-tests

* Fix incorrect allocation-error handling

* [CI] Disable openmp tests for navi2, navi3, and navi4

* Suppress leaks by omptarget and llvm

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 2d072f9217]
2025-03-22 18:51:42 -05:00
Kuricheti, Mythreya d1aeee3599 Add an option to disable perfetto debug annotations in json tool (#258)
* Add opt-in to disable perfetto annotations

Add an env option `ROCPROFILER_DISABLE_PERFETTO_ANNOTATIONS`
to disable perfetto function-arg annotations.

If there are a large number of records, the tests that use this tool timeout
on some machines

* Update iteration kind

* Remove test_retired_correlation_ids for page-migration

[ROCm/rocprofiler-sdk commit: bbacf70ec7]
2025-03-14 13:06:18 -07:00
Trowbridge, Ian 74efdad1aa rocJPEG API Tracing (#73)
* rocDecode API Tracing support

* Test bin file added to rocdecode. Need to add validate python methods

* Added option to not make rocDecode tests

* Added rocdecode and rocprofv3 tests

* Added csv test

* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI

* Add option to avoid building rocdecode tests

* Added option to avoid building rocdecode bin file

* Support for rocJPEG API Trace

* Added newline to rocjpeg_version.h

* json-tool code added, initial test/bin commit

* Formatting

* Resolved rocjpeg bin test compilation errors

* Tests implemented. Perfetto module currently resulting in errors, so need to retest whenever it is fixed

* Formatting and compilation errors

* Minor fixes

* Copyright year update and minor fixes

* Doc update fix

* Added rocjpeg csv file in data

* Addresses review comments: Updated fixed Findroc.. and uses root directory as a hint, fixed documentation error, changed tables to use _CORE, minor style fixes

* Added rocdecode and rocjpeg to CI

* Removed rocdecode and rocjpeg from CI and added back build tests option

* Updated Cmake Files

* Added rocDecode and rocJPEG to CI

* Remove cmake line added in error

* Temporarily modified tests to pass if rocdecode or rocjpeg tracing are not supported for CI, cmake changes

* Added find_package for test

* Added back use of system rocDecode and rocJPEG, modifies system files to include prefix path

* Updated no-link to include INCLUDE_DIR/roc(decode|jpeg), added comments for tests

* Resolve merge conflicts and formatting

* Added regex find and replace instead of include for CI

* VAAPI package causing errors on Vega20

* Removed system rocjpeg and rocdecode use temporarily until cmake issues resolved

* Removed workflows regex

* Formatting and minor test modification

* Modified test for vega20

* Update rocDecode and rocJPEG cmake and tests

* Changelog

* Fix merge conflict

* Added back if-statements around add-tests since cmake-generator-expressions are resulting in errors when the packages are missing

* Removed if found statements, replaced with TARGET:EXISTS

* Skip json file for rocjpeg and rocdecode tests if not supported

* Add os import

---------

Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 31fe8858d1]
2025-02-21 13:43:49 -08:00
Madsen, Jonathan b610b5ff56 SDK: No bg thread if no clients use SDK (#123)
* SDK: No bg thread if no clients use SDK

* Update CHANGELOG

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 0fbe6cc7b6]
2025-02-06 08:34:56 -06:00
Rawat, Swati edb51fc861 update copyright date to 2025 (#102)
* Update LICENSE

* Update conf.py

* Update copyright year

* [fix] Update copyright year

* Update copyright year "ROCm Developer Tools"

* Add license headers to c++ files

* Add license to *.py

* Update licenses in rocdecode sources

---------

Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Mythreya <mythreya.kuricheti@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 97b7a6315d]
2025-01-22 19:11:20 -06:00
Trowbridge, Ian 3e1b8ba4ec rocDecode API Tracing Support (#49)
* rocDecode API Tracing support

* Test bin file added to rocdecode. Need to add validate python methods

* Added option to not make rocDecode tests

* Added rocdecode and rocprofv3 tests

* Added csv test

* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI

* Add option to avoid building rocdecode tests

* Added option to avoid building rocdecode bin file

* Merge conflict error

* CMake files changed in response to review comments. Attempting to implement callbacks.

* Turned off test building for rocdecode

* Minor fixes for review comments

* Review comments

* Updated formatting

* Document changes and format.hpp reversion. Need to remove iterate args support for now for later update.

* Remove iterate args support

* Remove iterate-args

* enforce abi versioning in macro if

* Fix doc error

* removed spaces to fix indentation error

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

[ROCm/rocprofiler-sdk commit: e307b89ca4]
2025-01-17 14:42:25 -08:00
Jakaraddi, Manjunath 82261be227 SWDEV-492623: Hip Host Function to Device Symbols Mapping (#18)
* Adding changes to register and read symbols from the hip fat binary

* adding json output for host_functions

* added error handling

* adding json tool support

* Adding tests

* formatting changes

* Adding documentation

* refactoring as per amd-staging

* Adding intializers and changing macros

* Fix page-migration background thread on fork (#31)

* Fix page-migration background thread on fork

After falling off main in the forked child, all the children
try to join on on the parent's monitoring thread. This results
in a deadlock. Parent is waiting for the child to exit, but
the child is trying to join the parent's thread which is
signaled from the parent's static destructors.

Even with just one parent and child, due to copy-on-write
semantics, a child signalling the background thread to join
will still block (thread's updated state is not visible
in the child).

This fix creates background treads on fork per-child with a
pthread_atfork handler, ensuring that each child has its own
monitoring thread.

* Formatting fixes

* Detach page-migration background thread and update test timeout

* Attach files with ctest

* Update corr-id assert

* Tweak on-fork, simplify background thread

* Revert thread detach

* Adding --collection-period feature in rocprofv3 to match v1/v2 parity (#9)

* Adding Trace Period feature to rocprofv3

* Adding feature documentation

* Update source/bin/rocprofv3.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fixing format

* Moving to Collection Period and changing the input params

* Format Fixes

* Fixing rebasing issues

* Removing atomic include from the tool

* Adding more options for units, optimizing the code

* Fixing rocprofv3.py

* Fixing time conv & adding time controlled app

* Fixing format

* Changing to shared memory testing methodology

* use of shmem use

* Fix include headers for transpose-time-controlled.cpp

* Format upload-image-to-github.py

* Removing shmem and using only env var to dump timestamps from the tool

* Tool Fixes + Test Config

* Adding Tests

* Fixing Review comments

* Update trace period implementation

* Update trace period tests

* check between start and stop timestamps

* Merge Fix

* Update validate.py

* Improve safety of rocprofiler_stop_context after finalization

* Pass context id to collection_period_cntrl by value

* Adding 20 us error margin

* Ensure log level for collection-period test is not more than warning

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- move error code check macros to implementation
- fix macros which check error code
- use constexpr values instead of #define

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- debugging for error that cannot be locally reproduced

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- improve error handling and logging

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- tweak to non-fatal logging messages

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- cleanup of logging messages

* Update host kernel symbol register data fields

* Update source/lib/rocprofiler-sdk/code_object/hip/code_object.hpp

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Kuricheti, Mythreya <Mythreya.Kuricheti@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 78d8f4b8ea]
2024-12-06 11:42:37 +00:00
Trowbridge, Ian 792329fefd SWDEV-492625 memory free functions (#11)
* SWDEV-492625: Track free memory HSA functions to help determine total amount of memory allocated on the system at any one time

* Minor fixes to address comments

* Update allocation size description

* Moved get function back to specialization, minor typo fixes

* Removed memory_operation_type field, removed memory_pool allocation enum, converted starting address to hex string for json format.

* Made conversion to hex_string a function, changed address to use union rocprofiler_address_t type, changed VMEM descriptors

* Removed as_hex from the global namespace

* Formatting

* Removed TRACK_EVENT for memory allocation, now TRACK_COUNTER for memory allocation is being performed

* Check if address was recorded before retrieving allocation size in generate Perfetto

* Formatting

* Update source/lib/output/generatePerfetto.cpp

* Explicitly disable app-abort tests

* Remove excluding app-abort test from workflow CI

- redundant bc these tests are explicitly marked as disabled now

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 79006bb896]
2024-12-06 00:05:30 -06:00
Madsen, Jonathan a79f8a0198 SDK: OMPT Support (#22)
* Ability to select alternative compiler per file

Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.

Misc updates

Update OpenMP target sample

- samples/ompt -> samples/openmp_target
- fix sample test of openmp-target
- reorganize files

Rework OpenMP implementation

Minor OpenMP implementation cleanup

Rename samples/openmp_target CMake targets

Add tests/bin/openmp

- OpenMP target test app in tests/bin/openmp/target

Format samples/openmp_target CMakeLists.txt

Misc lib/rocprofiler-sdk/openmp cleanup

- fix includes
- convert_arg

Update openmp.def.cpp

- tweak includes
- remove lots of temporary variables

Update samples

- common::get_callback_id_names() -> common::get_callback_tracing_names()
- add kernel dispatch, memory copy, scratch memory buffered tracing to openmp target sample

Fix code object operation names

- add "CODE_OBJECT_" prefix

Update include/rocprofiler-sdk/openmp/api_id.h

- remove spurious comment

Miscellaneous openmp updates

- similar API for openmp_begin and openmp_end
- move implementations of ompt callbacks to openmp.cpp
- ompt_{thread_begin,thread_end,parallel_begin,parallel_end}_callbacks are openmp_events

[SWDEV-484495] Fix int truncation in CSV output (#1098)

CSV output truncates doubles to ints when it shouldn't. Derived metrics
are (mostly) doubles and lose precision (or become worthless) if treated
as an int. Converted these to double to match the format we return from
rocprof-sdk.

Co-authored-by: Benjamin Welton <ben@amd.com>

Update limit for max counter records in rocprof-tool (#1073)

A fixed sized std::array is used to store counter records in rocprofiler SDK. This limit was breached in SWDEV-484742. Upping the limit to 512 to be less likely to reach this limit again.

adding proxy ompt_data_t * arguments

fixes for proxy pointers

- Implement proxy ompt_data_t* pointers for clients
- Add ompt_data_t* arguments back to callback API
- Modify openmp sample to illustrate use of proxy pointers

formatting

SWDEV-467350: Skipping tool counter iteration for unsupported hardware (#1083)

Fixing some accumulate metrics (#1089)

* Fixing some accumulate metrics

* Fixing some more accumulate metrics

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>

updating rocprofv3 help options (#1113)

* updating rocprofv3 help options

* updating CHANGELOG

Fixing installed pacakge tests in CI (#1119)

* Fixing installed pacakge tests in CI

* Formatted rocprofv3.py with black formatter

SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests. (#1112)

* SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests.

* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Adding backlog for codeobj changes

* Formatting

* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

---------

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

SWDEV-487621: Fixes for metric definitions (#1118)

* Fixes for metric definitions

* Removing gfx8

* Update changelog

* Fixing unit tests

* Small fixes

* Fix for write size

Fix PSDB change (#1120)

Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit c77e4d3b80

clang-18 build fix for RCCL (#1123)

Removes ambiguity on const usage, which clang-18 complains about
(preventing build with warn error).

mem copy direction field update (#1124)

Adding Node-id for debugging with log level trace (#1090)

fix botched rebase

Per Jonathan to remove -rdynamic warning so CI will continue

pedantic formatting

Correct the package name of rocprofiler-sdk (#1126)

* Correct the package name of rocprofiler-sdk

ROCM VERSION(for ex: 60300) was missing in the package name.
Added the same

* Use cmake cache string while setting the variable for ROCm Version

* correct the cmake-format

---------

Co-authored-by: Ranjith Ramakrishnan <Ranjith.Ramakrishnan@amd.com>

Fixing kokkosp tool library packaging (#1121)

* Fixing kokkosp tool library packaging

* Update source/lib/rocprofiler-sdk-tool/kokkosp/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update CMakeLists.txt

* Update CMakeLists.txt

* Component Requirement in CPack

* Adding package dependency

* Update CMakeLists.txt

* Update rocprofiler_config_packaging.cmake

* Fix rocprofiler-sdk-tool-kokkosp BUILD/INSTALL RPATH

- CMAKE_INSTALL_LIBDIR doesn't help

* Add BUILD/INSTALL RPATH to rocprofv3-trigger-list-metrics

- fixes packaging issues

* Update packaging

- core depends on rocprofiler-sdk-roctx
- add CPACK_DEBIAN_PACKAGE_SHLIBDEPS_PRIVATE_DIRS to resolve inter-package dependencies

* Fix package depends version format

* Improve tests/rocprofv3/summary/validate logging

* Update CI workflow

- prioritize roctx package in Install Packages step

* Remove setting <package-name>_VERSION in config.cmake.in

- this is automatically handled by existence of <package-name>-config-version.cmake

* Update rocprofiler-sdk-config.cmake

- relax find_package versioning requirements to same major and minor version

* Update rocprofiler-sdk-config.cmake

- relax find_package versioning requirements (remove EXACT, specify range)

* Tweak CI workflow

* Update perfetto_reader.py

- better handle failure to load trace processor

* Misc cleanup for config packaging

* Update config packaging

* Update config packaging

* Revert perfetto for core-rpm packages

* Revert perfetto for core-rpm packages

- perfetto < 0.9.0

* Tweak tests/rocprofv3/summary/validate.py

- reorder some checks

---------

Co-authored-by: Ammar Elwazir <aelwazir@useocpm2m-387-013.amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

Clang Warning Fixes (#1131)

Builds prevented on clang-18

Adding start and end timestamp columns in csv (#1128)

* Adding start and end timestamp columns in csv

* Adding assert check for the counter timestamps

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>

rocprofv3: docs and help menu updates (#1129)

* doc updates

* Correcting ROCtx information

* Making ROCTx string consistent

* missing occurence

Renamed agent profiling service to device counting service (#1132)

* Renamed agent profiling service to device counting service

Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:

find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +

* Converted dispatch profile to dispatch counting service

* Debug for functioal counters test

* Minor changes for CI

* Minor fix

* More fixes for CI

* Update evaluate_ast.cpp

---------

Co-authored-by: Benjamin Welton <ben@amd.com>

Testing updated RPM dockers (#1136)

* Testing updated RPM dockers

* Trying to fix PSDB for test package dependency

Agent Profiling Fixes for Broken/Improper API Usage (#1122)

Prevent's multiple setups of agent profiling on the same agent.

Fixes agent read context to only read agents that were setup.

Prevent copy of agent profiling internal data struct and reset
hsa_signal on move to prevent inadvertant delete.

Simplifying PR template (#1139)

Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.

Fixing installed pacakge tests in CI (#1119)

* Fixing installed pacakge tests in CI

* Formatted rocprofv3.py with black formatter

Fix PSDB change (#1120)

Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit c77e4d3b80

delete unused files

added arguments to some OMPT buffter records

* Fix cmake issues

Remove rocprofiler_ompt_finalize_tool

- a public API function is not necessary: should just finalize rocprofiler-sdk

Fix duplicate ROCPROFILER_{BUFFER,CALLBACK}_TRACING_KIND_STRING

Add lib/rocprofiler-sdk/ompt.hpp

- declares rocprofiler::sdk::finalize_ompt

Remove change to tests/rocprofv3/summary/conftest.py

Add set_fini_status(1) back to registration.cpp

Deleted uneeded files

Incoporate OpenMP code and sample

Fix merge issues with amd-staging

Add push_correlation_id for OpenMP tasking; improve debugability

fixup bad merge

* Suppress OpenMP data race

* Fix openmp_target sample

* Enum and struct name changes + source code reorg

- remove mix of ompt and openmp
  - opted for ompt
- changes made for consistency
  - ompt_api -> ompt
  - openmp_api -> ompt
  - OPENMP -> OMPT

* Update tests and more renaming

- dest_device_num -> dst_device_num
- src_addr -> src_address
- dest_addr -> dst_address
- remove info_type::begin
- require OMP_TARGET_OFFLOAD

* Update openmp-target test/sample env and labels

* Formatting

* Tweaks to cmake for openmp target

- Disable for thread sanitizers due to preloading issue

* OpenMP target cmake updates

- remove gfx1010 (fails on mi300)
- OPENMP_GPU_TARGETS

* Remove device_unload and target_map_emi support

- these are never supported by AMD OpenMP compilers

* Update CI workflow

- exclude openmp-target tests from navi3 and vega20

---------

Co-authored-by: Larry Meadows <Lawrence.Meadows@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 00c46fd5e5]
2024-12-05 22:48:19 -06:00
Jonathan R. Madsen 1ea688c447 Runtime Initialization Tracing (#1105)
* Runtime initialization tracing

- calbacks and buffer entries notifying when a runtime has been initialized

* Minor cleanup to registration.cpp

* JSON tool implementation

* Increase perfetto_reader timeout

* Handle perfetto_reader timeout when attr doesn't exist

* clang-tidy fixes to memory_allocation.cpp

[ROCm/rocprofiler-sdk commit: 249c50fc40]
2024-11-18 20:50:29 -06:00
itrowbri 94f4f56c40 Memory Allocation Tracking (#1142)
* Initial commit: Need to implement wrapper function to collect data and test that wrapper function is correctly replacing core HSA functions

* Attempted to implement wrapper implementation for hsa memory allocation functions. Need to modify generate record files and test if implementation is working as expected

* Debugging and implementing generateCSV function

* Memory allocation size and starting address outputted to csv and json file formats

* Formatting

* Initial setup for OTF2 and Perfetto generation

* Collecting agent id for memory_allocation and formatting

* Modified memory_allocation.cpp to set up code for AMD_EXT commands

* Support for memory_pool_allocate added

* Removed accidently added file

* Made flag optional and added more OTF2 and Perfetto code. Needs testing to ensure perfetto and OTF2 works

* Formatting

* Fixed perfetto and otf2 output

* Fixed flag issue due to incorrect buffer use

* Updated documentation

* Small cleaning and comments

* Added test for HSA memory allocation tracing

* Fixed summary test validation errors due to allocation tracing. Added type to location_base to create unique event ids for allocation due to OTF2 trace error

* Decreased lower limit of hip calls for test

* Modified summary tests to vary number of allocate requests

* Minor fixes to address comments. Still need to address OTF2 comments

* Fix docs and changed OTF2 to use enum for type specified in location_base construction

* Fixed schema error

* Added vmem command tracking. Need to add test

* Updated test to work with vmem command and updated generateCSV to output int instead of hex string.

* OTF2 enum update and mispelling fix

* CI does not support Virtual Memory API. Removed vmem test. Will add back if CI is modifed to suport vmem API

* Update CMakeLists.txt for memory allocation test

* Updated summary test

* Minor fixes to address comments

* Moved domain_type.hpp enum to before LAST

* Fixed compile errors and formatting

* Fixed stats summary domain name error

* Added rocprofv3 test

* Page migration test fix

* Undo page migration test changes. Failures do not appear to have to do with memory allocation

[ROCm/rocprofiler-sdk commit: 3bd7773cf7]
2024-11-18 20:22:14 -06:00
Mythreya 36d357337d Report page migration events as start/end (#793)
* Squashed commit of the following:

commit b76f2635f4b65599f03812a73d0cf410f5ada213
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Fri Apr 26 00:29:09 2024 +0000

    Changed for PR feedback

commit bedb8ad566ff42fbf117b19202c26c507abcf8ac
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date:   Thu Apr 25 19:20:06 2024 -0500

    Fix installation

commit a98f8a69459a1450a1be9c98e20b3c1e7f2568c2
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date:   Thu Apr 25 19:16:35 2024 -0500

    Restructure the headers

commit 46489a020ffafdd5f4ce3f580469ff233ef67fe1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 23 23:31:10 2024 +0000

    Update hsa include

commit 8e795282cce348fc6aa736b7857b21aeb32aa20a
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 23 23:02:32 2024 +0000

    Report page migration events as start/end

    * Updated tests accordingly
    * Page migration events are reported independently

commit 8784e5ad4895a626a2a8e4ac12f8021b34172bd4
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 16 17:01:57 2024 +0000

    Update handling of dropped page migration events

    Previously, we dropped all locally buffered events when we detect that
    KFD has dropped some events. This may drop too many pending events too eagerly.

    When we receive an end event and cannot find the corresponding start,
    we can be sure that KFD has dropped some events in the immediate past.

    When this happens, we look through all locally buffered events and report
    the start events that are older than 10s as partial events --- they have
    no "end" information (we expect that the end events have been dropped).

    We also set the polling timeout to 10s to prevent the local buffer from
    getting too large with events waiting to be paired up.

    Updated tests

commit 2e8e0b07eeda9b5990e1ae8d28dcd3a035ce38e1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 16 17:01:31 2024 +0000

    Docs for triggers

* Fix page migration sample

* Fix hasher, kfd install

* Add hsa include
* Install KFD include dir

* Updates from code review

- single timestamp field
- node_id -> agent_id
- from_node -> from_agent
- to_node -> to_agent

* Misc revisions

* Remove page-migration install target

* Update page-migration pytest

* Tweak to serialization

* Address PR comments

* Update page-migration test

* Add cli args, update iterations

* Address PR comments

* Add abi.cpp for static_asserts
* Update page_migration gtest with only runtime tests
* Moved helpers into utils.hpp

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 363f85dc72]
2024-11-11 11:08:47 -06:00
Jonathan R. Madsen 07f698a532 CMake: Consistently name CMake Targets (#1082)
* Change all rocprofiler-X target names to rocprofiler-sdk-X

* Update rocprofiler-sdk-config.cmake

- fix install tree target names
- simplify logic for using find w/ components and find w/o components

* Update rocprofiler-sdk-roctx-config.cmake

- simplify logic for using find w/ components and find w/o components

* Update samples/intercept_table/CMakeLists.txt

- demonstrate/test use of `find_package(rocprofiler-sdk ... COMPONENTS ...)`

[ROCm/rocprofiler-sdk commit: 74facf87a6]
2024-10-25 11:17:34 -05:00
venkat1361 3819a846da Check to force tools to initialize the ctx id to zero. (#1135)
* Check to force tool to initialize the ctx id to zero.

* initialize rocprofiler_context_id_t with 0 in units tests

* changelog

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>

[ROCm/rocprofiler-sdk commit: 3f91d90bbc]
2024-10-22 18:09:25 +05:30
Benjamin Welton 345fde3ebc Renamed agent profiling service to device counting service (#1132)
* Renamed agent profiling service to device counting service

Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:

find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +

* Converted dispatch profile to dispatch counting service

* Debug for functioal counters test

* Minor changes for CI

* Minor fix

* More fixes for CI

* Update evaluate_ast.cpp

---------

Co-authored-by: Benjamin Welton <ben@amd.com>

[ROCm/rocprofiler-sdk commit: bb69467765]
2024-10-18 14:14:11 +05:30
Jonathan R. Madsen fcd6cc45bd Package RCCL headers to support adding RCCL support w/o installed headers (#1075)
- in ROCm CI, rocprofiler-sdk gets built before RCCL is installed, this is a workaround for this issue

[ROCm/rocprofiler-sdk commit: 8c1382fceb]
2024-09-12 18:24:50 -05:00
Mythreya c47a128941 Add support for RCCL tracing (#1047)
* [Draft]: Add support for RCCL tracing

Address comments

* [Draft]: Add support for RCCL tracing

Address PR comments, changes from RCCL upstream

* Add RCCL library table registration

Working on adding support to rocprofiler-register

* Support compilation w/o <rccl/amd_detail/api_trace.h>

- dummy api_trace.h header
- return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED when RCCL does not have api_trace.h header

* RCCL API tracing tool support

- add to rocprofv3
- add to json-tool

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 2a146259c7]
2024-09-12 00:42:58 -05:00
Gopesh Bhardwaj 2a4591dcae Fix rocprofv3 output filename containing sub-directory (#1062)
* Fix -d option broken by hostname

* Fix rocprofv3 output filename containing directory

* Fix TID handling in Perfetto and OTF2 output

* Revert changes which removed hostname

* Revise tests/rocprofv3/tracing output filenames

- specify an output filename for tests which include a subdirectory

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: a4b3a57ecc]
2024-09-10 17:49:36 -05:00
Jonathan R. Madsen 264c48fa69 Misc API cleanup and consistency fixes (#1023)
- ROCPROFILER_API after function
- use rocprofiler_tracing_operation_t in lieu of uint32_t where appropriate
- rocprofiler_tracing_operation_t is not int32_t typedef (formerly uint32_t)
- use const T* instead of T* where appropriate

[ROCm/rocprofiler-sdk commit: bb25376480]
2024-08-20 01:06:12 -05:00
Gopesh Bhardwaj 2393b2ee45 look for symbols in dynsym table (#990)
* look for symbols in dynsym table

* checking both symtab and dynsym

* Avoid symbol duplication in non stripped binaries

* clang-format

* Minor elf_utils.cpp updates

- use 'else if' instead of 'if'
- logging tweaks

* Update registration

- tweak logging

* Update testing

- strip the rocprofiler-sdk-c-tool library
- add test-c-tool-rocp-tool-lib-execute test which does NOT LD_PRELOAD the library (uses only ROCP_TOOL_LIBRARIES instead)

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: dc671497da]
2024-07-26 10:21:04 +05:30
Jonathan R. Madsen 12ae4e6ce4 Migrate to rocprofiler-sdk:: namespace in CMake everywhere (#892)
- remove all usage/support for rocprofiler:: namespace

[ROCm/rocprofiler-sdk commit: a76f61a0a3]
2024-05-29 22:28:43 -05:00
Jonathan R. Madsen ada9d97b78 Miscellanous AFAR 5 Updates (#891)
* Dispatch table copy/update uses ROCP_TRACE instead of ROCP_INFO

* Update rocprofiler-sdk CMake config

- rocprofiler::rocprofiler is alias to rocprofiler-sdk::rocprofiler-sdk instead of other way around

* Prefer rocprofiler-sdk::rocprofiler-sdk over rocprofiler::rocprofiler

* Fix WITH_UNWIND for glog

- requires a value of "none" instead of boolean now

* Update include/rocprofiler-sdk/registration.h

- explicit struct names to permit forward decl

* Update include/rocprofiler-sdk/cxx/serialization.hpp

- ROCPROFILER_SDK_CEREAL_NAMESPACE_BEGIN and ROCPROFILER_SDK_CEREAL_NAMESPACE_END to enable customized namespace

[ROCm/rocprofiler-sdk commit: 5525b400c3]
2024-05-29 16:45:56 -05:00
Jonathan R. Madsen 9fde046078 JSON schema + JSON node vs. Perfetto category consistency (#869)
* Fix schema differences b/t JSON and perfetto

- "kernel_dispatches" (JSON) vs. "kernel_dispatch" (Perfetto) -> "kernel_dispatch"
- "scratch_api" (JSON) -> "scratch_memory"

* Update include/rocprofiler-sdk/cxx/serialization.hpp

- remove unnecessary includes causing circular dependency

* rocprofv3 docs

- Output formats
- JSON schema

* Spelling fix

* Update schema descriptions

* Improve assert log in test_perfetto_data

[ROCm/rocprofiler-sdk commit: b4dc4f8e92]
2024-05-22 20:36:27 -05:00
Jonathan R. Madsen 46a9637496 Adding Perfetto support (#867)
* Perfetto submodule

* include/rocprofiler-sdk/cxx/perfetto.hpp

- adapted from tests/common/perfetto.hpp
- updated json-tool to use <rocprofiler-sdk/cxx/perfetto.hpp>

* Update include/rocprofiler-sdk/cxx

- add details/delimit.hpp
- add details/join.hpp
- extend details/mpl.hpp
- extend details/operators.hpp

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- update MEMORY_COPY direction names

* Preliminary perfetto support

* Update lib/rocprofiler-sdk-tool/generatePerfetto.cpp

- fix getting roctx msg vs. buffer operation name

* Temporary variable restructuring

* Perfetto patches after rebasing onto main

* Revert lib/rocprofiler-sdk/hsa/async_copy.cpp

- revert name

* Update lib/rocprofiler-sdk-tool/generatePerfetto.cpp

- fix ReadTrace

* Update tests/bin/hip-in-libraries

- sleep_for

* Support PFTRACE output format option in rocprofv3

* Change perfetto logging

* Update rocprofv3 tests to generate pftrace output

* Minor tweak to json-tool.cpp

* Update requirements.txt for perfetto testing

* Fix data race on amount_read in generatePerfetto.cpp

* Add testing for pftrace output

- relatively simple testing which verifies that the pftrace file has the same number of entries as JSON data for HIP/HSA/marker/kernel/memory_copy

* Fix import in perfetto_reader.py

* Fix data race in generatePerfetto.cpp

[ROCm/rocprofiler-sdk commit: 957bb7a4e5]
2024-05-22 15:51:12 -05:00
Jonathan R. Madsen 0ba9f26a6a Adding JSON support (#860)
* Adding json support

minor bugs

Fixing tests

Fixing formatting issues

Fixing test

test fix

Misc testing fixes

Use rocprofiler/cxx/name_info in rocprofiler-sdk-tool

fixes to reduce the Json file size

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/tool.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/tool.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/helper.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/helper.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/tool.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/tool.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/helper.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/tool.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

misc fixes

Removing int cast for JSON tests

formatting

removing a condition test on Navi3

adding debug info

Misc fix

* CSV updates

- fix stats
- numerical formatter support for customizing write_csv_entry
- misc formatting
- get_marker_stats_file

* Misc tests/rocprofv3/counter-collection/input2 fixes

- rocprofiler_configure_pytest_files in rocprofv3/counter-collection/input2
- removed state code from merge in rocprofv3/counter-collection/input2

* Tool: "Agent-id" -> "Agent_Id"

- consistency

* Tool update

- remove rocprofiler_tool_marker_record_t
- add marker_tracing_kind_conversion
- fix memory leak in write_json
- minor update to get_output_stream
- rework handling of marker records

* Update tests/pytest-packages/pytest_utils/__init__.py

- add collapse_dict_list function for converting a dictionary value that is a list of length one into a directly mapped value

* Update tests/rocprofv3/**/conftest.py

- use collapse_dict_list when reading in JSONs

* Update tests/rocprofv3/counter-collection/input1/validate.py

- relax testing requirements gfx1102 (AQLProfile bugs)
  - in addition to relaxed testing requirements for gfx1101

* Update tests/rocprofv3/tracing/validate.py

- fix removal of PID in every marker record

* Update tests/rocprofv3/tracing-plus-cc

- remove test design that relies on iterating subdirectories

* Wrapper around __libc_start_main

- Ensures finalization happens before main returns
- Update tests/rocprofv3/tracing/validate.py
  - wrapper around __libc_start_main changed roctx calls

* Combine include/rocprofiler-sdk/cxx/serialization.hpp and include/rocprofiler-sdk/external/serialization.hpp

- tests/common/serialization.hpp simply includes include/rocprofiler-sdk/cxx/serialization.hpp now

* Update lib/rocprofiler-sdk/hip/hip.cpp

- tracing function immediately returns when fini_status is non-zero

* Update lib/rocprofiler-sdk/hsa/hsa.cpp

- remove logging of tracing function when fini_status is non-zero

* Update lib/rocprofiler-sdk-tool/CMakeLists.txt

- remove rocprofv3_trigger_list_metrics.cpp from TOOL_SOURCES

* Update tests/rocprofv3/tracing-plus-cc/CMakeLists.txt

- fix depends

* Domain statistics

* Update tests/rocprofv3/tracing-plus-cc/CMakeLists.txt

- do not set ROCP_LOG_LEVEL in env

* Remove erroneous <bits/utility.h> include

* Restructure tool source + reduce tool table + support multiple formats

- buffered_output struct for handling output
- support multiple output formats, e.g. --output-format csv,json
- rename buffer_type_t -> domain_type
- simplified generation of CSV output files
- removed rocprofiler_tool_marker_record_t

* Update lib/common/container/ring_buffer.hpp

- value_type alias in ring_buffer<Tp>

* Remove all but one json-execute tests

- generate CSV and JSON in same run

* Fix include for domain_type.cpp

* Update tests/rocprofv3/tracing-plus-cc/input.txt

- only specify counters which can be found on gfx8, gfx9, gfx10, gfx11, etc.
- use :device= syntax

* Update lib/rocprofiler-sdk-tool/config.cpp

- support :device=N syntax for counters file
- improve stripping comments in PMC files
- only read after pmc:

* Rework tool library counter collection

- fatal error if all requested counters for device are not found
- support :device= syntax

* Update tests/rocprofv3/tracing-plus-cc/input.txt

- removed L2CacheHit (not supported on mi300)

* Disable JSON tests in tests/rocprofv3

* Update include/rocprofiler-sdk/cxx/serialization.hpp

- support rocprofiler_record_dimension_info_t

* Update tool JSON schema

- remove domain_type::CODE_OBJECT
- rocprofiler_tool_agent_v0_t
  - rocprofiler_agent_v0_t + counters
- rocprofiler_tool_counter_info_t
- get_code_object_data()

* Update JSON schema for tool

* Update lib/rocprofiler-sdk-tool/tool.cpp

- fix ROCP_WARNING_IF

* rocprofv3 -> rocprofv3.sh

- install rocprofv3.sh into sbin
- configure_file <source-tree>/rocprofv3.sh -> <binary-tree>/bin/rocprofv3

* Update tool counter collection

- rocprofiler_tool_record_counter_t
- rocprofiler_tool_counter_collection_record_t

* Update tests/rocprofv3/counter-collection/input1/CMakeLists.txt

- use rocprofiler_configure_pytest_files for validate.py, conftest.py, and input.txt

* Update tests/rocprofv3/counter-collection/input1/validate.py

- re-enable test_validate_counter_collection_pmc1_json

* Update tests/rocprofv3/counter-collection/input2/validate.py

- remove unused code

* Update tests/rocprofv3/counter-collection/input2/validate.py

- remove unused code

* Update tests/rocprofv3/hsa-queue-dependency/validate.py

- re-enable JSON tests

* Misc tests/rocprofv3 CMake updates

* Update tests/rocprofv3/tracing/validate.py

- re-enable JSON tests

* Update tests/rocprofv3/tracing-hip-in-libraries/validate.py

- re-enable JSON tests

* Update tests/rocprofv3/tracing/validate.py

- remove unused node_exists function

* Update tests/rocprofv3/tracing/validate.py

- fix test_marker_api_trace_json

---------

Co-authored-by: Sriraksha Nagaraj <Sriraksha.Nagaraj@amd.com>

[ROCm/rocprofiler-sdk commit: 92b7326910]
2024-05-22 00:53:42 -05:00
Jonathan R. Madsen f167317524 Public C++ header files and samples updates (#819)
* Public C++ header files (source/include/rocprofiler-sdk/cxx)

* Update samples/api_buffered_tracing

- scratch memory and page migration
- README

* Update samples/api_buffered_tracing

- page migration component in sample

* Update tests/page-migration/validate.py

- fix checks for page migration operation names

* Update tests/page-migration/validate.py

- fix get_allocated_pages

* Update scratch memory and page migration validations

* Fix include/rocprofiler-sdk/cxx installation

* Rework include/rocprofiler-sdk/cxx

- Improve name_info to support const char*, string_view, string

* Update samples/api_{buffered,callback}_tracing

* External correlation ID request sample

- includes correlation ID retirement demo

* Update samples/api_buffered_tracing/README.md

* Update lib/rocprofiler-sdk/hsa/queue.cpp

- generate correlation ID for kernel launch if one doesn't exist

* Remove priority check from tool libraries (samples/tests)

- if(priority > 0) return nullptr check in rocprofiler_configure has proliferated beyond its intended use

* Apply suggestions from code review

[ROCm/rocprofiler-sdk commit: de13d2ac5d]
2024-04-25 20:09:11 -05:00
Jonathan R. Madsen 1514f05887 Async memory copy callback tracing + memory copy size (#791)
* Async memory copy tracing update

- rocprofiler_buffer_tracing_memory_copy_record_t: thread_id and bytes
- support ROCPROFILER_CALLBACK_TRACING_MEMORY_COPY
- init_public_api_struct can fully construct

* Testing for callback async copy tracing

[ROCm/rocprofiler-sdk commit: 12c836f95f]
2024-04-18 04:31:59 -05:00
Jonathan R. Madsen 627a9f54f1 Simplify json-tool JSON schema for callback records (#790)
- formerly, the rocprofiler_callback_tracing_record_t data was stored in itr["record"], e.g. itr["record"]["correlation_id"]
  - dropped "record" key, e.g. itr["correlation_id"]

[ROCm/rocprofiler-sdk commit: 32bc339789]
2024-04-18 03:58:10 -05:00
Jonathan R. Madsen 2aef3c3d15 rocprofiler_kernel_dispatch_info_t + header record for buffered counter collection (#758)
* Update include/rocprofiler-sdk

- defines.h
  - ROCPROFILER_VERSION_10_0 -> ROCPROFILER_SDK_VERSION_0_0
- fwd.h
  - rocprofiler_counter_record_kind_t
  - rocprofiler_kernel_dispatch_info_t
  - rocprofiler_record_counter_t
    - has dispatch id instead of correlation id
  - rocprofiler_counter_info_v0_t
    - added rocprofiler_counter_id_t field
    - added is_constant field
    - reordered better packing
- dispatch_profile.h
  - added rocprofiler_profile_counting_dispatch_record_t for use as a header record for rocprofiler_profile_counting_dispatch_data_t
- callback_tracing.h
  - rocprofiler_callback_tracing_kernel_dispatch_data_t uses rocprofiler_kernel_dispatch_info_t
- buffer_tracing.h
  - rocprofiler_buffer_tracing_kernel_dispatch_record_t uses rocprofiler_kernel_dispatch_info_t

* Update lib/rocprofiler-sdk/*

- transition to rocprofiler_kernel_dispatch_info_t
- set id and is_constant values for rocprofiler_counter_info_v0_t in rocprofiler_query_counter_info

* Update lib/rocprofiler-sdk-tool

- transition to rocprofiler_kernel_dispatch_info_t

* Update lib/rocprofiler-sdk/counters/tests/core.cpp

- transition to rocprofiler_kernel_dispatch_info_t

* Update samples

- transition to rocprofiler_kernel_dispatch_info_t
- transition to rocprofiler_counter_record_kind_t

* Update tests

- transition to rocprofiler_kernel_dispatch_info_t
- transition to rocprofiler_counter_record_kind_t
- improve integration test validation for counter-collection
- update serialization for new/additional types

* Fix tests/counter-collection/validate.py

- loosen restrictions on the length of counter description

* Update include/rocprofiler-sdk/buffer_tracing.h

- remove accidental packed attribute

* Update lib/rocprofiler-sdk/counters/xml/derived_counters.xml

- Add description for TCC_TAG_STALL_sum (reference: https://rocm.docs.amd.com/en/develop/conceptual/gpu-arch/mi300-mi200-performance-counters.html)

* Update tests/page-migration/validate.py

[ROCm/rocprofiler-sdk commit: 07537b6231]
2024-04-12 17:30:34 -05:00
Mythreya 4f99edbad5 Page migration reporting (#651)
* Page migration reporting support

* Page migration: Update parser and reporting

Container does not lave latest KFD header, so CI might fail

* Add kfd_ioctl.h

* Formatting

* Update get_key

- get key was not used (and shouldn't be), so delete it

* clang-tidy fixes

* Tests for page migration

* Apply suggestions from code review

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update tests/bin/page-migration/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update page-migration test app

- add hipHostRegister to register mmap'ed allocation with HIP
- misc cleanup and reorg
- remove HSA_XNACK=1 from test env

* Update lib/rocprofiler-sdk/tests/page_migration.cpp

- fix compilation error

* Minor updates (reorg, rename)

* Page migration reporting support

* Page migration: Update parser and reporting

Container does not lave latest KFD header, so CI might fail

* Update page migration tests, fix trigger types

* Page Migration Tracing Support Refactoring (#753)

* Reorganization

* Update page migration init/fini

* Formatting

* Update page_migration.cpp

- change logging severity

* Skip test if KFD does not support page migration reporting

* Rework skipping test if KFD does not support page migration

* Fix event trigger enum values

* Fix clang-diagnostic-unused-const-variable

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

[ROCm/rocprofiler-sdk commit: fd3d97287c]
2024-04-12 15:51:44 -05:00
Jonathan R. Madsen 03fb9ace21 CTest Environment Update (#756)
* Update test/tools/json-tool.cpp

- push/pop ppid as external correlation id instead of pid

* Update environment variables for tests and samples

* Revert to old CDash dashboard in run-ci.py

* Revert to new CDash dashboard in run-ci.py

[ROCm/rocprofiler-sdk commit: 3eaa678054]
2024-04-12 08:40:00 -05:00
Jonathan R. Madsen 5e8a3b4f16 Callback tracing for kernel dispatches + External correlation ID request service (#682)
* Support ROCPROFILER_CALLBACK_TRACING_KERNEL_DISPATCH

* Fix doxygen

* Update callback tracing

- temporary hacks for kind operation name and iterate kind operations

* Update source/include/rocprofiler-sdk

- introduce sequence id for kernel dispatches

* Update lib/rocprofiler-sdk (seq id)

- support sequence id passing

* Update tests (seq id)

- testing for sequence ids

* Cleanup include/rocprofiler-sdk/fwd.h

* Misc cleanup

* External Correlation ID Request Service (#699)

* External correlation ID request service

- callback requesting an external correlation ID instead of fetching from top of pushed external correlation ID stack

* Update external correlation id request support

- pass internal correlation ID in callback
- async copy generates a correlation ID if none already exists
- added external correlation ID request support for scratch memory tracing
- updated scratch memory tracing to use tracing:: functions

* Update hsa/queue.hpp

- new line at EOF

* Misc tweaks

- remove unnecessary logging in agent.cpp
- correlation_id::add_ref_count check for retirement
- finalization check in HSA queue AsyncSignalHandler

* Improve assertion failure logging in misc tests

* Update include/rocprofiler-sdk/fwd.h

- remove rocprofiler_record_counter_header_t

* Move lib/rocprofiler-sdk/tracing.hpp into lib/rocprofiler-sdk/tracing/ folder

* Update lib/rocprofiler-sdk/hsa/*

- hsa::get_hsa_status_string
- queue_info_session.hpp header
- rocprofiler_packet.hpp

* Update lib/rocprofiler-sdk/{counters,hip,marker}

- execute_phase_exit_callbacks tweaks
- queue_info_session tweaks

* Move rocprofiler_kernel_dispatch_operation_t to include/rocprofiler-sdk/fwd.h

* Update rocprofiler_buffer_tracing_kernel_dispatch_record_t

- add operation field and thread_id field

* Add lib/rocprofiler-sdk/kernel_dispatch

- enum <-> string mapping for kernel dispatch
- tracing implementations

* Update lib/rocprofiler-sdk/CMakeLists.txt

- tracing and kernel dispatch sub-directories

* Update lib/rocprofiler-sdk/{buffer,callback}_tracing.cpp

- invoke rocprofiler::kernel_tracing functions

* Update tests/common/serialization.hpp

- support operation and thread_id fields for rocprofiler_buffer_tracing_kernel_dispatch_record_t

* Update tests/tools/json-tool.cpp

- use external correlation id request service

* Rename sequence_id to dispatch_id

[ROCm/rocprofiler-sdk commit: 56030018dc]
2024-04-11 19:49:49 -05:00
Jonathan R. Madsen 95acc01042 Fix code_object_operation_t and memory_copy_operation_t enums (#751)
- enums for operations should not contain callback/buffer tracing categorization
- e.g. ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_LOAD should be ROCPROIFLER_CODE_OBJECT_LOAD

[ROCm/rocprofiler-sdk commit: 0f5c575435]
2024-04-11 18:52:13 -05:00
Gopesh Bhardwaj 6076c751a3 hsa multiqueue application (#618)
* hsa multiqueuw application

* cmake formatting (cmake-format) (#619)

Co-authored-by: bgopesh <7112102+bgopesh@users.noreply.github.com>

* source formatting (clang-format v11) (#620)

Co-authored-by: bgopesh <7112102+bgopesh@users.noreply.github.com>

* comppialtion fix

* Update tests/bin/CMakeLists.txt

Reorder `add_subdirectory` to fix (recurrent) issues with ROCTx in `CMAKE_BUILD_RPATH`

* addressing early feedback

* cmake updates

* more cmake updates

* adding queue dependency test

* updating test

* test updates

* removed hsa_api_trace header

* reformating headers to prevent clang from reordering

* Fixing packaging

* Fixes for hangs

* source formatting (clang-format v11) (#676)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* cmake formatting (cmake-format) (#673)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* structure change

* cmake formatting (cmake-format) (#680)

Co-authored-by: bgopesh <7112102+bgopesh@users.noreply.github.com>

* Adding kernel trace to test hang fix

* rebased and fixed kernel-trace test

* rhel clang fixes

* source formatting (clang-format v11) (#685)

Co-authored-by: bgopesh <7112102+bgopesh@users.noreply.github.com>

* Update lib/rocprofiler-sdk-tool/helper.hpp

- remove suppression of -Wshadow

* Update tests/bin/hsa-queue-dependency/CMakeLists.txt

- cleanup unnecessary code
- GPU_LIST -> GPU_TARGETS

* GPU_LIST -> GPU_TARGETS

* Remove installation of test executable and libraries

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bgopesh <7112102+bgopesh@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 348d740388]
2024-04-09 10:19:16 -05:00
Mythreya fb1b61d79a Add support for scratch reporting (#523)
* Add ToolsApiTable

Add ToolsApiTable wrapping for
scratch memory tracking

* Add initial support for scratch memory tracking

Buffering is implemented

* cmake formatting (cmake-format) (#525)

Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>

* source formatting (clang-format v11) (#524)

Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>

* Add callback tracing for scratch

Fixed the error where scratch tracking init was called irrespective of whether any client requested for it

* Apply suggestions from code review

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* Fix tools api copy/update

Table were saved/updated incorrectly in previous
commit. Also adds passing user data through the callback

* Fix OpKind sequence for scratch tracking

Previously scratch was using OpKind from rocprofiler-sdk, but
templates were instantiated using API ID. These differ by 1

* Integration tests for scratch reporting

Added buffer and callback integration tests for scratch reporting

* source formatting (clang-format v11) (#550)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* cmake formatting (cmake-format) (#551)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* python formatting (black) (#549)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* CI fixes

* source formatting (clang-format v11) (#554)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* Update api

Rebase on main and updates based on PR feedback

* Update scratch reporting and address PR comments

- Added agent id to buffer records
- Updated `test_internal_correlation_ids` - Is almost identical to
  one in async-copy
- Updated scratch test to check for agent id
- Updated queue id serialization in callback records (prints
  handle as nested key)
- Remove `marker_api_traces` from scratch `test_internal_correlation_ids`
  validation test
- Rename `amd_tools_api` to `scratch_memory`
- Added doxygen comments
- Remove scratch callback from `tool.cpp`
- Replace assert with `LOF_IF` in `scratch_memory.cpp`

* Update tools table

Changed to match up with changes to hsa tables in main branch

* Rework scratch memory structure

* Update tests

- Added suggestions from PR review, and updated tests accordingly

* Misc cleanup

* Update scratch test

As of Apr 4th, `hsa_amd_agent_set_async_scratch_limit` is disabled.

Note,
> This API: `hsa_amd_agent_set_async_scratch_limit` is currently
> disabled. We need some changes in CP firmware to be able to do this
> and these changes are not ready yet.
> With the current code, you will also not get notifications for
> alternate-scratch allocations because this feature has been disabled
> while CP firmware is making additional changes
> We are hoping to have that feature enabled by ROCm-6.3

* Minor update to lib/rocprofiler-sdk/internal_threading.*

- delay destruction of shared_ptrs of the tasks to prevent rare (but possible) data race on the destruction of the shared_ptr

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 4fa165ec1a]
2024-04-05 20:32:57 -05:00
Jonathan R. Madsen 043db3b3a4 Use small_vector for API iterate_args (#597)
* Use small_vector for API iterate_args

- replace dim3 value arguments with rocprofiler_dim3_t
  - dim3 has a non-trivial destructor
- common::mpl::unqualified_type
- common::stringified_argument_array_t<N> alias
- assert_public_data_type_properties()
- common::container::small_vector<T>::at function
- stringize returns small_vector<stringified_argument>
  - stack allocated vector
- remove has_pc_sampling condition (HSA, HIP)
  - this will be handled in queue interception

* Misc tweaks

[ROCm/rocprofiler-sdk commit: 8591ed1c96]
2024-03-13 07:36:55 -05:00
Jonathan R. Madsen 407fc57ede Shared Library Constructor (rocprofv3 deadlock fix) (#599)
* Moved tests/apps to tests/bin

* Renamed cmake project in tests/bin

* Update samples

- Use ROCPROFILER_DEFAULT_FAIL_REGEX
- tweaks to stdout messages

* Update tests

- Use ROCPROFILER_DEFAULT_FAIL_REGEX

* Add tests/lib

- libraries with HIP code

* Update PTL submodule

- remove atexit delete of thread_id_map

* Update cmake/rocprofiler_options.cmake

- Set ROCPROFILER_DEFAULT_FAIL_REGEX

* Update common lib: env + logging

- improved customization of logging settings
- default to disabling logging to files
- install failure handler for rocprofv3
- set_env support in environment.*

* Add lib/rocprofiler-sdk/shared_library.cpp

- shared library constructor

* Update lib/rocprofiler-sdk-tool/tool.cpp

- destructor thread safety
- convert callback_name_info and buffered_name_info to pointers
- install failure handler for logging

* Add tests/bin/hip-in-libraries

- hip-in-libraries is an exe which uses two shared libraries where each shared library contains HIP kernels
  - used for testing deadlocking within __hipRegisterFatBinary

* Update bin/rocprofv3

- reorganized the env variables
- use exec to launch command
- set ROCPROFILER_LIBRARY_CTOR=1

* Add tests/rocprofv3/tracing-hip-in-libraries

- uses hip-in-libraries exe for exe which uses shared libraries to launch HIP kernels

* Update bin/rocprofv3

- fix counter collection (no exec)

* Update lib/rocprofiler-sdk-tool/tool.cpp

- replace "Kernel-Name" with "Kernel_Name"

* Update lib/rocprofiler-sdk/registration.cpp

Use RTLD_LOCAL instead of RTLD_GLOBAL for env libraries

* Update tests/rocprofv3

- replace "Kernel-Name" with "Kernel_Name"

* Update tests

- vector-ops (bin) stream syncs + runs with 4 queues per device
- improve counter-collection/input1 validation
- rocprofv3/tracing-hip-in-libraries does not do sys-trace
- improved validation script for tracing-hip-in-libraries
- updated dispatch_callback in json-tool.cpp following reworking of prototypes for counter collection

* Update samples/counter_collection

- updated dispatch_callback(s) and record_callback(s) following reworking of prototypes

* Update bin/rocprofv3

- reorganized help menu
- added options for sub-HSA tables
- added --hip-runtime-trace
- changed --hip-trace to include --hip-compiler-trace

* Update lib/rocprofiler-sdk-tool

- improved kernel filtering
- removed arch_vgpr, accum_vgpr, sgpr code (in rocprofiler-sdk)
- fixed issue with counter-collection w/o tracing
- added support for fine grained HSA API tracing
- removed directly linking to HSA-runtime

* Update lib/rocprofiler-sdk/agent.cpp

- rocp_agents != hsa_agents is non-fatal when ROCPROFILER_BUILD_CI=OFF (CMake option)

* GPR (vector and scalar) info in kernel symbol data

- rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t contains general purpose register info

* Header include order fix

- Include repo headers first
- Third party library headers next
- standard library headers last

* Update dispatch profiling public API

- introduce rocprofiler_profile_counting_dispatch_data_t
- change signature of rocprofiler_profile_counting_dispatch_callback_t and rocprofiler_profile_counting_record_callback_t
- provide rocprofiler_user_data_t pointer in dispatch callback
- provide rocprofiler_user_data_t value (from dispatch cb) in record callback

* Update tests/bin/CMakeLists.txt

- fix add_subdirectory(hip-in-libraries) order

* Update VERSION

- bump to 0.2.0 in prep for AFAR

[ROCm/rocprofiler-sdk commit: 7b6d3c70bd]
2024-03-07 22:21:26 -06:00
Jonathan R. Madsen d820cf3018 Update rocprofiler_query_available_agents(...) (#596)
* Agent info version

* Complete implementation

- revert "rocprofiler_iterate_agents" to "rocprofiler_query_available_agents"

* Misc tweaks

- update rocprofiler_query_available_agents impl

* Update include/rocprofiler-sdk/agent.h

- Fix undocumented param for rocprofiler_query_available_agents

[ROCm/rocprofiler-sdk commit: 1d33d4cf78]
2024-03-06 02:17:40 -06:00
Jonathan R. Madsen cb6b79c323 Fix rocprofiler_iterate_callback_tracing_kind_operation_args for HIP compiler callbacks (#532)
* Fix HIP compiler iterate args

- `include/rocprofiler-sdk/hip/api_args.h`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/callback_tracing.cpp`
  - iterate_args for HIP compiler table
- `lib/rocprofiler-sdk/registration.cpp`
  - fix warning about roctx num_tables
- `lib/rocprofiler-sdk/hip/hip.def.cpp`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/{hip,hsa,marker}/utils.hpp`
  - improve `stringize_impl`
- `lib/rocprofiler-sdk/hsa/code_object.cpp`
  - remove stale commented out code
- `lib/rocprofiler-sdk/hsa/queue_controller.*`
  - destory_queue -> destroy_queue
- `tests/tools/json-tool.cpp`
  - improve parallelism in tool_tracing_callback
  - serialize the marker api args
  - only invoke rocprofiler_iterate_callback_tracing_kind_operation_args in exit phase
- `samples/counter_collection/CMakeLists.txt`
  - reduce timeout on tests to 120 seconds

* Update lib/rocprofiler-sdk/hsa/utils.hpp

- disable dereference of double pointer in stringize_impl

* Update lib/common

- indirection_level in mpl.hpp
- stringize_arg.hpp

* Rework rocprofiler_iterate_callback_tracing_kind_operation_args

- provide more information in rocprofiler_callback_tracing_operation_args_cb_t
- support specifying the dereference level to account for output paramters

[ROCm/rocprofiler-sdk commit: 1bb94add11]
2024-03-01 01:46:07 -06:00
Jonathan R. Madsen 15302ff11d C compatibility for public headers (#566)
* C compatibility for public headers

- add tests/tools/c-tool.c
  - builds a tool (which does nothing) with C language
  - ensures that tool can be compiled in C
- add tests/c-tool/CMakeLists.txt
  - ensures that tool library build from C is a valid tool
- rocprofiler_counter_info_v0_t is_derived is int instead of bool
  - C does not have bool unless <stdbool.h> is included
- add `include/rocprofiler-sdk/hsa/api_trace_version.h
  - handles providing HSA_*_TABLE_(MAJOR|STEP)_VERSION values if compiled from C
- cmake define in version.h.in for ROCPROFILER_HSA_*_TABLE_(MAJOR|STEP)_VERSION
  - HSA table versions compiled with
- use rocprofiler_(hsa|hip|marker)_api_no_args struct to handle incompatibility b/t empty structs in C vs. C++ (size of 0 vs. size of 1)
- extern "C" in include/rocprofiler-sdk/{hsa,hip,marker}/api_args.h
- fixed spelling error: derrived -> derived
- scope YY_NO_INPUT compile definition to lib/rocprofiler-sdk/counters/parser/*

* Revert CDash dashboard

[ROCm/rocprofiler-sdk commit: a1267e1fd2]
2024-02-29 23:49:54 -06:00
Jonathan R. Madsen a360de4550 Correlation ID Retirement + misc (#527)
* Correlation ID Retirement

- include/rocprofiler-sdk/buffer_tracing.h
  - add rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- include/rocprofiler-sdk/fwd.h
  - ROCPROFILER_BUFFER_TRACING_CORRELATION_ID_RETIREMENT
- lib/rocprofiler-sdk/buffer_tracing.cpp
  - kind string for correlation id retirement
- lib/rocprofiler-sdk/buffer.hpp
  - emplace returns bool
- lib/rocprofiler-sdk/registration.cpp
  - pass lib_instance to copy_table functions
- lib/rocprofiler-sdk/context/context.*
  - update correlation_id struct
    - make ref_count private
    - {get,add,sub}_ref_count() functions
      - sub_ref_count() performs correlation id retirement
    - use stack for "latest" thread-local correlation id
- lib/rocprofiler-sdk/hip/hip.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/hsa.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/marker/marker.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/async_copy.cpp
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - handle table instance in async_copy_init / async_copy_save
- lib/rocprofiler-sdk/hsa/queue.cpp
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - tweak to external correlation id mapping in WriteInterceptor
- tests/async-copy-tracing/validate.py
  - check retired_correlation_ids
- tests/common/serialization.hpp
  - support rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- tests/kernel-tracing/validate.py
  - check retired_correlation_ids
- tests/common/CMakeLists.txt
  - perfetto external project
- tests/common/perfetto.hpp
  - perfetto categories + aliases
  - add_perfetto_annotation
  - metaprogramming helpers
- tests/tools/CMakeLists.txt
  - link to tests-perfetto
- tests/tools/json-tool.cpp
  - demangling functions
  - serialization of marker API callback args
  - reduce parallel bottleneck in tool_tracing_callback
  - support correlation id retirement
  - Multiple threads for buffers
  - Support ROCPROFILER_TOOL_CONTEXTS_EXCLUDE env variable
  - write_perfetto() function

* Update tests/rocprofv3/tracing/validate.py

- tweak test_hsa_api_trace

* Update PTL submodule

- fixes for data race during destruction of task

* Update lib/rocprofiler-sdk/buffer.*

- unique_buffer_vec_t uses std::unique_ptr instead of allocator::unique_static_ptr_t

* Reduce timeouts in counter collection samples [skip ci]

* Update tests/tools/json-tool.cpp

- tweak demangle(string_view, int*) -> demangle(string_view, int&)

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- move sub_ref_count() to later in async_copy_handler to delay retirement slightly more

[ROCm/rocprofiler-sdk commit: 875f53b608]
2024-02-23 10:30:33 -06:00
Jonathan R. Madsen f760a3ceaa Updates/fixes for CI, docs, tests, samples, and common library (#528)
- .github/workflows/continuous_integration.yml
  - apt-get update before apt-get install
  - remove libgtest-dev
  - actions-comment-pull-request: v2.4.3 -> v2.5.0
- .github/workflows/formatting.yml
  - create-pull-request: v5 -> v6
- cmake/rocprofiler_options.cmake
  - remove unused ROCPROFILER_DEBUG_TRACE and ROCPROFILER_LD_AQLPROFILE options
- samples/counter_collection/callback_client.cpp
  - corr_id field renamed to correlation_id
- samples/counter_collection/client.cpp
  - corr_id field renamed to correlation_id
- include/rocprofiler-sdk/fwd.h
  - In rocprofiler_record_counter_t: rename corr_id field to correlation_id
  - doxygen fixes
- lib/common/utility.*
  - remove get_accurate_clock_id_impl
  - timestamp_ns() defaults to CLOCK_BOOTTIME
- lib/rocprofiler-sdk/counters/core.cpp
  - fix spelling mistake: extrenal -> external
  - corr_id field renamed to correlation_id
- lib/rocprofiler-sdk-tool/tool.cpp
  - fix destruction of static tool::output_file before finalization
- scripts/update-docs.sh
  - define PROJECT_NAME
- tests/async-copy-tracing/validate.py
  - init_time and fini_time checks
  - hip_api_traces, marker_api_tracing
- tests/common/serialization.hpp
  - fix save function for rocprofiler_record_counter_t following rename of corr_id to correlation_id
- tests/kernel-tracing/validate.py
  - init_time and fini_time checks
  - relax test_total_runtime range
- tests/rocprofv3/tracing/CMakeLists.txt
  - remove -M from rocprofv3-test-systrace-execute
  - exclude test_hsa_api_trace in rocprofv3-test-systrace-validate due to HIP API tracing
- tests/rocprofv3/tracing/validate.py
  - update test_kernel_trace to accept mangled or demangled
- tests/tools/json-tool.cpp
  - remove use of GLOG
  - include init_time and fini_time
  - write_json(...) function

[ROCm/rocprofiler-sdk commit: 0d939edbba]
2024-02-22 00:16:43 -06:00