* public codeobj info
* Restructure code object source code file layout
* Update get_unloaded_code_objects + add iterate_loaded_code_objects
* Remove get_unloaded_code_objects from visible internal API
- iterate_loaded_code_objects + functor which filters on the hsa_executable_t effectively reproduces this behavior
* Whitespace removal
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Remove -Wno-missing-field-initializers
- Compiler errors if missing field initializers
* Update lib/rocprofiler-sdk/counters/evaluate_ast.cpp
- copy over dispatch ID in perform_reduction/evaluate
* Fix drm include for OpenSUSE
- uses libdrm/drm.h instead of drm/drm.h
* Fix "List Files" step in CI workflows
* Fix "List Files" step in CI workflows
* Minor fix
Removal of HSA from counter collection
Tests for AQL
Updated counter collection client to build profiles in tool init
* Rebased
* Debug printing
* Formatting
* More format
* fix shadowing
---------
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* Page migration reporting support
* Page migration: Update parser and reporting
Container does not lave latest KFD header, so CI might fail
* Add kfd_ioctl.h
* Formatting
* Update get_key
- get key was not used (and shouldn't be), so delete it
* clang-tidy fixes
* Tests for page migration
* Apply suggestions from code review
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update tests/bin/page-migration/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update page-migration test app
- add hipHostRegister to register mmap'ed allocation with HIP
- misc cleanup and reorg
- remove HSA_XNACK=1 from test env
* Update lib/rocprofiler-sdk/tests/page_migration.cpp
- fix compilation error
* Minor updates (reorg, rename)
* Page migration reporting support
* Page migration: Update parser and reporting
Container does not lave latest KFD header, so CI might fail
* Update page migration tests, fix trigger types
* Page Migration Tracing Support Refactoring (#753)
* Reorganization
* Update page migration init/fini
* Formatting
* Update page_migration.cpp
- change logging severity
* Skip test if KFD does not support page migration reporting
* Rework skipping test if KFD does not support page migration
* Fix event trigger enum values
* Fix clang-diagnostic-unused-const-variable
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* Update lib/rocprofiler-sdk/context/*
- create correlation_id.{hpp,cpp} and moved implementation into these files instead of in context.{hpp,cpp}
* Update lib/rocprofiler-sdk/thread_trace/att_core.hpp
- fixed header includes
* Update lib/common/utility.hpp (runtime sizeof)
- added compute_runtime_sizeof<T>() function to set the "size" field to be the offset of the "reserved_padding" field if one exists
* Fix to compute_runtime_sizeof
* rocprofiler-sdk-codeobj: use pkg-config to find libdw / libelf
The current version of source/lib/rocprofiler-sdk-codeobj/CMakeLists.txt
adds -ldw and -lelf to target_link_libraries. However, on a system where
libdw-dev / libelf-dev is missing, the cmake configuration phase will
run properly and a compile time error will eventually be raised.
This patch changes the CMakelists.txt to search for libelf libdw and
configures the target as needed. Systems missing the required support
should report an error when running cmake instead of in the middle of
the compilation.
* Use INTERFACE targets
* Resolve issues with Findlib{dw,elf}
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
- questionable data race within std::regex in CI
- simplify rocprofiler::tool::format
- set config::tmp_directory to default to output_path
- fs::create_directories for tmp_file
- rework get_file_name(...) and compose_tmp_file_name(...) in tool.cpp
- enums for operations should not contain callback/buffer tracing categorization
- e.g. ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_LOAD should be ROCPROIFLER_CODE_OBJECT_LOAD
* Enable INFO logging on retried CI jobs
* Update lib/rocprofiler-sdk/async_copy.cpp
- rework active_signals
- make hsa_signal_t member variable
- remove sync from destructor
- replace _is_set with atomic counter
- timeout of 30 seconds hsa_signal_wait
- switch from relaxed to scacquire/screlease memory ordering
- improve logging and error handling
- destroy hsa signal in active_signals in async_fini
* Update lib/rocprofiler-sdk/async_copy.cpp
- active_signals::create
- change initial value of signal to 1 instead of value of completion signal
- change condition trigger of signal callback
* Update tests/counter-collection/validate.py
* Update lib/rocprofiler-sdk/async_copy.cpp
- improved logging
- fix hsa_signal_wait_scacquire_fn check
* Cleanup tests/lib/transpose/transpose.cpp
- remove huge comment block
* Appears to be working on MI200
Dependency Versions:
clr: f7b1398361 - compile mode: release
hsa-runtime: 4cd6c62f25dbbdbaa8580dd4ad8f388c98c508da - compile mode: RelWithDebug
* Update source/lib/rocprofiler-sdk/hsa/async_copy.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Format fix
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <aelwazir@hpe6u-21.amd.com>
* adding pandas and pytest to rquirements.txt
* setting up requrements.txt
* Update requirements
- formatting packages
- remove packages not directly used by rocprofiler-sdk
* Update cmake formatting, linting, and options
- if BUILD_CI -> force BUILD_DEVELOPER and BUILD_WERROR
- support python installed clang-format and python installed clang-tidy
* Update build.sh
- split into install-deps.sh and install-apt-deps.sh
* Improve code coverage
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Added first ATT API
* Finalizing thread trace API
* Fixing more rebase conflicts
* Added codeobj disassembly sample
* Fixing merge issues with rebase [2]
* Adding ATT packets
* Implemented thread trace intercept
* Moved codeobj parser to same repo as rocprofiler
* Moved thread trace to new API
* Fixing merge conflicts
* Fixing more merge conflicts
* Adding thread trace packet reuse
* Merged aql_profile_v2 headers
* Linked ATT sample to aqlprofile
* Updated decoder to include non-loaded codeobjs
* Implemented ISA decoder into ATT sample
* Added marker_id to vaddr
* Updating aql_profile_v2 API to memcpy
* Updating thread trace API to include 64bit markers. Using the result of ISA matching.
* Added instruction type and cycles summary
* Updated sample with selection of kernel by kernel_object
* Added option to copy from memory kernels
* Moved tool_data in thread_trace to dynamic alloc
* Restoring hsa.cpp
* Fixed ATT sample crash. General improvements.
* Moved codeobj library to outside src/
* Updated license header
* Moved codeobj_capture to camelcase
* Solving some more merge conflicts
* Update samples/advanced_thread_trace/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update samples/advanced_thread_trace/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update samples/code_object_isa_decode/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/thread_trace/CMakeLists.txt
* Removing unused parameter check
* Adding const to isEmpty
* Removing unused warning
* Adding libdw-dev to requirements
* Running clang-format
* Commenting out new aql calls
* Clang format
* Unused variable fix
* Adding codeobj-decoder coverage
* Commenting out threadtrace
* Update samples/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* P
* WOverloaded
* Addressing clang-tidy
* Virtual destructor on ttracer class
* Corr id
* Fixing code source format
* Update CMakeLists.txt
* Build fixes
* Update source/lib/rocprofiler-sdk-codeobj/code_object_track.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix shadowing
* Update CMakeLists.txt
* Update samples/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Add ToolsApiTable
Add ToolsApiTable wrapping for
scratch memory tracking
* Add initial support for scratch memory tracking
Buffering is implemented
* cmake formatting (cmake-format) (#525)
Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>
* source formatting (clang-format v11) (#524)
Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>
* Add callback tracing for scratch
Fixed the error where scratch tracking init was called irrespective of whether any client requested for it
* Apply suggestions from code review
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* Fix tools api copy/update
Table were saved/updated incorrectly in previous
commit. Also adds passing user data through the callback
* Fix OpKind sequence for scratch tracking
Previously scratch was using OpKind from rocprofiler-sdk, but
templates were instantiated using API ID. These differ by 1
* Integration tests for scratch reporting
Added buffer and callback integration tests for scratch reporting
* source formatting (clang-format v11) (#550)
Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>
* cmake formatting (cmake-format) (#551)
Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>
* python formatting (black) (#549)
Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>
* CI fixes
* source formatting (clang-format v11) (#554)
Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>
* Update api
Rebase on main and updates based on PR feedback
* Update scratch reporting and address PR comments
- Added agent id to buffer records
- Updated `test_internal_correlation_ids` - Is almost identical to
one in async-copy
- Updated scratch test to check for agent id
- Updated queue id serialization in callback records (prints
handle as nested key)
- Remove `marker_api_traces` from scratch `test_internal_correlation_ids`
validation test
- Rename `amd_tools_api` to `scratch_memory`
- Added doxygen comments
- Remove scratch callback from `tool.cpp`
- Replace assert with `LOF_IF` in `scratch_memory.cpp`
* Update tools table
Changed to match up with changes to hsa tables in main branch
* Rework scratch memory structure
* Update tests
- Added suggestions from PR review, and updated tests accordingly
* Misc cleanup
* Update scratch test
As of Apr 4th, `hsa_amd_agent_set_async_scratch_limit` is disabled.
Note,
> This API: `hsa_amd_agent_set_async_scratch_limit` is currently
> disabled. We need some changes in CP firmware to be able to do this
> and these changes are not ready yet.
> With the current code, you will also not get notifications for
> alternate-scratch allocations because this feature has been disabled
> while CP firmware is making additional changes
> We are hoping to have that feature enabled by ROCm-6.3
* Minor update to lib/rocprofiler-sdk/internal_threading.*
- delay destruction of shared_ptrs of the tasks to prevent rare (but possible) data race on the destruction of the shared_ptr
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Add debug printing for write interceptor injected packets
Adds debug printing for write interceptor injected
packets. All packets that pass through the write
intercepter while enabled will be printed.
Only executes/prints when the environment variable
GLOG_v is set to 2 or higher (otherwise it is a no-op
and the expression is not evaluated).
* source formatting (clang-format v11) (#675)
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
* Changes on fmt location
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
* Wait for all mem copies to complete before destroying.
* Update source/lib/rocprofiler-sdk/hsa/async_copy.cpp
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
* Update async_copy.cpp
---------
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
- update lib/rocprofiler-sdk/internal_threading.*
- use PTL::TaskManager instead of PTL::TaskGroup
- easier to handle for our needs
- eliminate data race in rocprofiler_flush_buffer
- combine memory management of TaskManager and ThreadPool
* Fix agent node id + randomize offset id
- fixes the node_id value
- randomizes a constant offset for the id.handle values
- switch to using node ids in rocprofiler-sdk-tool library
- update tests related to agents
* Logical node id
- sequential node id values from 0 to (N-1) where N is the number of agents
* Handle hsa_queue_destroy after finalization
- fixes issue where hsa_queue_destroy(...) is invoked after rocprofiler-sdk has finalized
- hsa::get_queue_controller() returns pointer
- if queue controller is a null pointer, skip invoking QueueController::destroy_queue
* Update HIP/HSA/marker update_table logging
* Update rocprofv3 tests
- remove HSA_TOOLS_LIB env variable
- remove setting ROCPROFILER_LOG_LEVEL env variable
- add timeouts to tests which are missing them
* Disable thread sanitizer deadlock detection
* Update CI workflow
- rename vega20-ubuntu job to core-ci
- enable navi32 in core-ci and sanitizers
* Update run-ci.py
- set gcovr html medium and high threshold
* Update lib/rocprofiler-sdk/hsa/queue_controller.cpp
- remove this capture from enable/disable serialization
* Update lib/rocprofiler-sdk/hsa/{hsa_barrier,profile_serializer}.*
- hsa_barrier::set_barrier accepts const-ref to queue map
- profile_serializer::enable and profile_serializer::disable accept const-ref to queue map
* Logging for HIP/HSA/marker/profile_serializer
* Logging for HIP/HSA/marker/queue_controller
* Improve test_retired_correlation_ids asserts
* Fix tests/counter-collection/validate.py
- scale expected SQ_WAVES counter value based on warp size of GPU
* Tweak github comment for code coverage
* Remove gcovr html high/medium threshold args
* Fix tests/counter-collection/validate.py
- round before casting to int in test_counter_values
* operator bool for profile_serializer
- only wait on CV if profile_serializer is used
* Logging updates (profile_serializer + code_object)
* Update counter-collection validate.py
* QueueController does not wait on CV if finalizing/finalized
* Update CI workflow
- remove navi32 from core job
* Improve HIP/HSA/marker tracing get_functor/functor
- remove lambda wrapper around functor
* Update lib/rocprofiler-sdk/hsa/queue_controller.cpp
- do not acquire cvmutex lock during finalization
* Update lib/rocprofiler-sdk/hsa/hsa_barrier.*
- move ctor and dtor to implementation
- skip signal store screlease and destroy if already finalized
* Update CI workflow
- remove navi32 runners
* bwelton fixes for hangs
* CMake improvements + simplified demangle
- remove amd-comgr from common target (and thus removed from roctx DT_NEEDED)
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Improve error checks related to context create/start/stop/is_valid
* Bump version to 0.2.1
* Track number of kernels associated with correlation id
- add atomic kernel counter variable to context::correlation_id
* Update lib/rocprofiler-sdk/hsa/queue.cpp
- apply the +/- kernel count
* Fix tracing context domain logic for operations
- logic error: domain enabled (all operations all implicitly enabled) + domain enabled for subset of operations resulted in only explicitly enabled operations being treated as enabled
- domain_context: split single bitset for operations in all domains into array of bitsets for each domain
* Update lib/common/mpl.hpp
- assert_false for static_asserts in if constexpr expressions
* Update lib/rocprofiler-sdk/tests/contexts.cpp
- Tests for validating logic regarding domain and operations for callback and buffer tracing
* Use small_vector for API iterate_args
- replace dim3 value arguments with rocprofiler_dim3_t
- dim3 has a non-trivial destructor
- common::mpl::unqualified_type
- common::stringified_argument_array_t<N> alias
- assert_public_data_type_properties()
- common::container::small_vector<T>::at function
- stringize returns small_vector<stringified_argument>
- stack allocated vector
- remove has_pc_sampling condition (HSA, HIP)
- this will be handled in queue interception
* Misc tweaks