* [SWDEV-518071] Return HSA not loaded status (device counter collection)
This is a state that a caller would want to know about to understand if
they got no counters because of a failure or if they were trying to
collect counters too early (as is the case in the sample, which can
attempt to collect counters before HSA is inited).
* Minor edit
* format
* [SWDEV-518081] Simplify Metric Loading (#243)
* [SWDEV-518071] Return HSA not loaded status (device counter collection)
This is a state that a caller would want to know about to understand if
they got no counters because of a failure or if they were trying to
collect counters too early (as is the case in the sample, which can
attempt to collect counters before HSA is inited).
* [SWDEV-518324] Add AST update support
Allows the ability for ASTs to be updated (instead of an unchangable
static value). Adds a shared pointer return type to protect against
static destructors/modifications from invalidating potentially in use
AST definitions. No functionality/use changes in this PR.
* [SWDEV-518593] Add updatable dimension cache + fix string issues (#252)
* [SWDEV-518593] Add updatable dimension cache + fix string issues
Updates dimension cache to use the same design pattern as AST/Metrics.
Fixes the string scoping issue seen in ASTs, which appears here as well.
* Add rocprofiler_create_counter
Creates derived counters based on input from the API. This PR does three
things:
1. Adds the API + test case
2. Validates that an AST can be constructed from the counter supplied.
3. Updates metrics, ast, and dimension caches to include the new metric.
Metric should be available for use immediately after the call completes.
Due to the regeneration of ASTs, this call should not be performed in
performance sensitive code.
* Suggestion fixes
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Minor tweak
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
* Fixes for comments
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
* Perfetto duration temp fix setup
* Add timestamp change amounts to ROCP Info
* Groups kernel dispatch info by agent and queue id before sorting. Midpoint interpolation is then performed on the sorted kernels
* Moved dispatch bins into the for-loop
* Fix compilation error by using const ref
* Modified for review comments
* Changed variable names
* Initial fix for runtime error in id_decode.hpp:set_dim_in_rec()
* actual fix: corrected the handling of case where dim==1 (ROCPROFILER_DIMENSION_NONE)
* removing magic numbers
* minor fix
* fix for invalid bool value at runtime
* clang format
* build fix
---------
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Fix segfault on fail to query GPU name
* Format
* Review comments
* Format
* Review comment
---------
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
* Add debug printing statement to packet submission
Adds debug printing to packets being submitted to HSA Queue in device
counting mode.
* Minor change
* Small fix
* formatting
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
* rocDecode API Tracing support
* Test bin file added to rocdecode. Need to add validate python methods
* Added option to not make rocDecode tests
* Added rocdecode and rocprofv3 tests
* Added csv test
* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI
* Add option to avoid building rocdecode tests
* Added option to avoid building rocdecode bin file
* Support for rocJPEG API Trace
* Added newline to rocjpeg_version.h
* json-tool code added, initial test/bin commit
* Formatting
* Resolved rocjpeg bin test compilation errors
* Tests implemented. Perfetto module currently resulting in errors, so need to retest whenever it is fixed
* Formatting and compilation errors
* Minor fixes
* Copyright year update and minor fixes
* Doc update fix
* Added rocjpeg csv file in data
* Addresses review comments: Updated fixed Findroc.. and uses root directory as a hint, fixed documentation error, changed tables to use _CORE, minor style fixes
* Added rocdecode and rocjpeg to CI
* Removed rocdecode and rocjpeg from CI and added back build tests option
* Updated Cmake Files
* Added rocDecode and rocJPEG to CI
* Remove cmake line added in error
* Temporarily modified tests to pass if rocdecode or rocjpeg tracing are not supported for CI, cmake changes
* Added find_package for test
* Added back use of system rocDecode and rocJPEG, modifies system files to include prefix path
* Updated no-link to include INCLUDE_DIR/roc(decode|jpeg), added comments for tests
* Resolve merge conflicts and formatting
* Added regex find and replace instead of include for CI
* VAAPI package causing errors on Vega20
* Removed system rocjpeg and rocdecode use temporarily until cmake issues resolved
* Removed workflows regex
* Formatting and minor test modification
* Modified test for vega20
* Update rocDecode and rocJPEG cmake and tests
* Changelog
* Fix merge conflict
* Added back if-statements around add-tests since cmake-generator-expressions are resulting in errors when the packages are missing
* Removed if found statements, replaced with TARGET:EXISTS
* Skip json file for rocjpeg and rocdecode tests if not supported
* Add os import
---------
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Counter track for memory allocation is now a running sum showing total allocation
* Address review comments
* Update source/lib/output/generatePerfetto.cpp
Co-authored-by: Meserve, Mark <Mark.Meserve@amd.com>
* Updated to reflect review comments
* Fix compilation errors on CI
* remove braces on scalar
* Fix struct compilation issues
* Removed name_to_id for sanitizer
---------
Co-authored-by: Meserve, Mark <Mark.Meserve@amd.com>
* rocprofv3: suppress agent info when no data collected
* Update output config serialization
- full serialization of output configuration
* Update rocprofiler-sdk-att/tests
- add version and soversion
- change output directory
- generate libatt_decoder_summary
- disable tests instead of removing them
* Update rocprofv3 command-line
- make --att-library-path hidden by default
- simplify check_att_capability
- reorder pc sampling options
- add hidden --echo option
- remove ROCPROF_LIST_AVAIL_TOOL_LIBRARY from preload
* Add new rocprofv3 tests for specify the ATT library path
* Tweak to rocprofv3-test-hsa-multiqueue-att tests
* Update rocprofv3 tool to enable output with att
* Fix standalone test installation
* Revert to fetchcontent_makeavailable to fetchcontent_populate
* Revert tests/common/CMakeLists.txt
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* [DO NOT MERGE] Misc UUID updates
- this is WIP
* Agent visibility
- Support for ROCR_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, CUDA_VISIBLE_DEVICES, GPU_DEVICE_ORDINAL
* Update CHANGELOG
* tweak to rocprofiler_agent_runtime_visiblity_t
* Code object kernel address
- new fields in code_object_kernel_symbol_register_data_t
- kernel_code_entry_byte_offset
- kernel_address
* Support ROCR_VISIBLE_DEVICES reordering devices for HIP
* Addressed code review changes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* rocprofv3: do not abort if counter does not have dimensions
* Relax error handling further in rocprofv3 metadata
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Force HSA_AMD_MEMORY_POOL_EXECUTABLE_FLAG value to be used with HSA calls
Fix for CI
* More tweaks
* Increase reproducible-runtime kernel sleep granularity
* Fix data race in synchronous device counter collection sample
* Update device counting service
- add get_active_context function
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Add regex for undefined behavior to ROCPROFILER_DEFAULT_FAIL_REGEX
- add UBSAN_OPTIONS to setup-sanitizer-env.sh
* Improve ROCPROFILER_DEFAULT_FAIL_REGEX
* Use -fno-sanitize-recover=undefined flag
- this compiler flag causes all undefined behavior errors to exit
* Revert ROCPROFILER_DEFAULT_FAIL_REGEX
* fix for shift overflow
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Manjunath-Jakaraddi <manjunath.jakaraddi@amd.com>
* Adding New HIP APIs
* Format Fix
* Format Fix
* Removing changes from ostream and moving it to format
* Addressing Code Review Comments
* Versioning the new hip calls formatting
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
* [SWDEV-509876] Remove buffer requirement from device counting service
No longer require a buffer to be given when setting up device counting
service. This is to reduce performance overhead in cases where immediate
return of counting samples is being used (synchronous mode).
* Missed file
* Update source/include/rocprofiler-sdk/device_counting_service.h
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Update source/lib/rocprofiler-sdk/counters/controller.cpp
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Update source/lib/rocprofiler-sdk/counters/device_counting.cpp
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Fixes for build
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Fix async copy validation test
- make the async copy tracing test work regardless of however many HSA memory copies the HIP memory copy decomposes into
* Fix rocprofv3 memory copy tests
* Fix compilation support for hipGraphBatchMemOpNodeGetParams
* Fix rocprofv3-test-summary-*-validate
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Fix HIP data type stringify
- when ROCPROFILER_CI is not defined, provide default for case statements
- Add support for hipGraphNodeTypeBatchMemOp when HIP version is >= 6.4.0
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
We need execute permission for HSA memory (req for IB buffers).
Enforcement is upcoming which will break counter collection (see
ticket).
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
* rocDecode API Tracing support
* Test bin file added to rocdecode. Need to add validate python methods
* Added option to not make rocDecode tests
* Added rocdecode and rocprofv3 tests
* Added csv test
* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI
* Add option to avoid building rocdecode tests
* Added option to avoid building rocdecode bin file
* Merge conflict error
* CMake files changed in response to review comments. Attempting to implement callbacks.
* Turned off test building for rocdecode
* Minor fixes for review comments
* Review comments
* Updated formatting
* Document changes and format.hpp reversion. Need to remove iterate args support for now for later update.
* Remove iterate args support
* Remove iterate-args
* enforce abi versioning in macro if
* Fix doc error
* removed spaces to fix indentation error
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* [SWDEV-509659] Skip rocprof device counting tests if lacking permissions
Skips non-intercept test if proper permissions are not obtained
(SYS_PERFMON). This should be the only test that fails due to permission
issues (others do not require the IOCTL to pass).
Regex match sample: https://regexr.com/8b29s
* Update source/lib/rocprofiler-sdk/counters/tests/CMakeLists.txt
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* fix
* Update source/lib/rocprofiler-sdk/counters/tests/CMakeLists.txt
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>