* SDK: create CMake option for strict checks on CPU vs. GPU timestamps
- Configurating CMake with `ROCPROFILER_BUILD_CI_STRICT_TIMESTAMPS=ON` will enable fatal errors if dispatch/memcpy timestamps on GPU are outside of the start/end time from the CPU
- `ROCPROFIELR_BUILD_CI_STRICT_TIMESTAMPS` defaults to the value of `ROCPROFILER_BUILD_CI`
* Formatting
* Disable async_copy frequency scaling
* Disable profiling dispatch time frequency scaling
* Support runtime configuration via env variables
- ROCPROFILER_CI_FREQ_SCALE_TIMESTAMPS env variable will enable scaling the timestamps based on the hsa timestamp period
- ROCPROFILER_CI_STRICT_TIMESTAMPS env variable will enable strict timestamp checks
- when cmake is configured with ROCPROFILER_BUILD_CI_STRICT_TIMESTAMPS=ON, this env variable defaults to true
* ROCPROFILER_BUILD_CI_STRICT_TIMESTAMPS defaults to OFF
* Update cmake-target
* Common tracing::adjust_profiling_time
---------
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
Migrates profiler_serializer class in QueueController to have an instance per-agent instead of one globally. Other changes in this commit are to allow for maps of the queues associated with each agent to be passed to profiler_serializer when it is turned on/off. Existing test cases cover whether or not the kernels are serialized (multistream app). New test case added to show that this serialization only occurs on a per device level with a kernel launched on one device waiting for a value to be set on the other.
* Update tests/bin/transpose/transpose.cpp
- add hipMemGetInfo call to display the available vs. total memory on the GPU
* Update tests/rocprofv3/summary/validate.py
- Updated test_summary_display_data after addition of hipMemGetInfo to transpose test exe
* Tweak code coverage comment uploading
- create unique orphan branch per PR
- reduce quality of PNG files (85 -> 70)
* Revert some of code coverage comment uploading
- remove creation of unique orphan branch per PR
* Tweak code coverage comment uploading
- create unique orphan branch per PR
* Change all rocprofiler-X target names to rocprofiler-sdk-X
* Update rocprofiler-sdk-config.cmake
- fix install tree target names
- simplify logic for using find w/ components and find w/o components
* Update rocprofiler-sdk-roctx-config.cmake
- simplify logic for using find w/ components and find w/o components
* Update samples/intercept_table/CMakeLists.txt
- demonstrate/test use of `find_package(rocprofiler-sdk ... COMPONENTS ...)`
* rocprofv3: support specifying PMC counters via command line
- E.g. `rocprofv3 --pmc SQ_WAVES -- <app>`
* Update CHANGELOG
* updated rocprofv3 help and documentation
---------
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
* Max reduction operation bug fix & support reduction for derived counters
* CHANGELOG
* Added missing new line
---------
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
* Check to force tool to initialize the ctx id to zero.
* initialize rocprofiler_context_id_t with 0 in units tests
* changelog
---------
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
* dim args for select token now supports range of values instead of single integer
* dim args for select token now supports range of values instead of single integer
* modified raw_ast fmt::format
* CHANGELOG.md
* varible rename for select dim from set to map in suffix
* use : for giving range of values
* Update source/lib/rocprofiler-sdk/counters/parser/scanner.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* format
* Update CHANGELOG.md
* using map instead of unordermap for select_dimension_map
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
Checks the decoding of AQL packets for SQ Waves by
launching a kernel, injecting the AQL packets, and
decoding the result. This does not use write interceptor
but does this check on a raw HSA stream with direct
injection.
Prevent's multiple setups of agent profiling on the same agent.
Fixes agent read context to only read agents that were setup.
Prevent copy of agent profiling internal data struct and reset
hsa_signal on move to prevent inadvertant delete.
* Renamed agent profiling service to device counting service
Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:
find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +
* Converted dispatch profile to dispatch counting service
* Debug for functioal counters test
* Minor changes for CI
* Minor fix
* More fixes for CI
* Update evaluate_ast.cpp
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
* Adding start and end timestamp columns in csv
* Adding assert check for the counter timestamps
---------
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
A fixed sized std::array is used to store counter records in rocprofiler SDK. This limit was breached in SWDEV-484742. Upping the limit to 512 to be less likely to reach this limit again.
CSV output truncates doubles to ints when it shouldn't. Derived metrics
are (mostly) doubles and lose precision (or become worthless) if treated
as an int. Converted these to double to match the format we return from
rocprof-sdk.
Co-authored-by: Benjamin Welton <ben@amd.com>
* Update HIP ABI tracing
* Minor HIP abi.cpp updates
* Misc roctx updates (version.h + more)
* Common static thread-local template struct
- static_tl_object
- similar to static_object but with thread-local semantics
* rocprofiler-sdk/version.h updates
* Update for HIP_RUNTIME_API_TABLE_STEP_VERSION == {4,5,6}
* Fix roctx.cpp tweaks
* Enable queue interception with scratch reporting
Scratch reporting reports agent ID in buffer and callback records, but
HSA runtime provides only queue ID in the scratch callback.
This change enables queue interception when scratch reporting is requested
* Validation test for rocprofv3 + scratch-memory-trace
* Simplify checks for whether context is tracing a domain
* Update changelog
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>