* [SDK][rocprofv3] HIP API buffer records with args (ext)
- New buffer tracing domain(s) for HIP APIs which include the arguments and the return value in the buffer records
- Update HIP stream support for extended HIP buffer tracing
- Update rocprofv3 tool library and output library to use extended HIP buffer tracing recods
* Update stream.cpp
- handle hipStream_t address being reused for a new stream
* Update doxygen docs for rocprofiler_iterate_buffer_tracing_record_args
* Update rocprofv3 tool.cpp
- configure buffer tracing services with HIP_*_API_EXT variants
- tweak logging level for hip_stream_display_callback
* Fix validation tests
- add HIP_RUNTIME_API_EXT and HIP_COMPILER_API_EXT to valid domain names
* Serialization support for buffer tracing args
* Disable stream service for __hipPopCallConfiguration
- this is interpreted as a stream create but it doesn't create a stream
* Fix execute_buffer_record_emplace for HIP extended contexts
* Add uint64_t_retval to rocprofiler_hip_api_retval_t union
- reading in hipError_t_retval during serialization of pointer return value causes undefined behavior
* Fix compilation warning about unused but set parameter
- in hip/stream.cpp
* Add synchronization for async_copy_data
* Fix compilation error
* Fix compilation error
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: e33dff7ad0]
* MI300 Stochastic PC sampling SDK API implementation
* ROCProfV3: Stochastic PC sampling Support (#94)
* ROCProfV3: MI300 Stochastic PC sampling initial draft
* ROCProfV3: Initial Stochastic PC sampling Tests (#95)
ROCProfV3: Initial Stochastic PC sampling tests
* Update rocprofiler_pc_sampling_record_stochastic_v0_t
- update doxygen docs for members
- replace rocprofiler_correlation_id_t with rocprofiler_async_correlation_id_t
* Relax the check in JSON tests
* drain PC sampling buffer during finalize_rocprofv3
* Increase timeout for "Test Install Build" step
- 10 minutes -> 20 minutes
- "Test Installed Packages" has 20 minutes so "Test Install Build" should also
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 49ce79a5b5]
* Potential fix for code scanning alert no. 24: Use of potentially dangerous function
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* clang-format fix
* use std::localtime_r instead of localtime.
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* localtime_r is defined in global namespace.
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit: c06feccf2a]
* Add Stack IDs
* Add memcpy test
* Add async corr id record
* Async events use `rocprofiler_async_correlation_id_t`
* Sync events use `rocprofiler_correlation_id_t`
* Update ATT to use asnyc IDs
* Review comments
[ROCm/rocprofiler-sdk commit: f27f76716e]
* Model Name fix for rocprofiler_lib.agent
* fixing format
* formatting source
* Adding comments and example
---------
Co-authored-by: Sushma Vaddireddy <svaddire@amd.com>
[ROCm/rocprofiler-sdk commit: ae0db8cee5]
* Update finalization and correlation ID retirement
- directly invoke finalize if only one client
- correlation_id_finalize
* Address PR comments
* Improve logging for correlation_id_finalize
* Fix correlation ID handling in memory allocation service
* Fix clang-tidy issues in hsa-memory-allocation test exe
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 91f7f42104]
* Fix for codeobj HSA table order
* Fix tests
* Format
---------
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
[ROCm/rocprofiler-sdk commit: b21452ec11]
* rocprofiler_stream_id_t: opaque handle for a stream
- e.g. HIP stream
- the same HIP stream may map to different HSA queues at different points in the application
- added to:
- rocprofiler_buffer_tracing_hip_api_record_t
- rocprofiler_buffer_tracing_memory_copy_record_t
- rocprofiler_callback_tracing_hip_api_data_t
- rocprofiler_callback_tracing_memory_copy_data_t
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Mark Meserve <mark.meserve@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Jakaraddi, Manjunath <Manjunath.Jakaraddi@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
Co-authored-by: Nagaraj, Sriraksha <Sriraksha.Nagaraj@amd.com>
Co-authored-by: U, Srihari <Srihari.U@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: ccd1e54293]
Adds iteration based multiplexing to counter collection. Counter groups can now be specified. These counter groups are collected on a device individually until a specified interval period is reached. When the interval is reached, the next counter group is set to be collected on subsequent kernel executions.
Supplies two new argument types that can be included in YAML/JSON inputs:
pmc_groups: an array of arrays containing the counter groups to run (i.e. [ ["SQ_WAVES", "GRBM_COUNT"], ["GRBM_GUI_ACTIVE"])
pmc_group_interval: the number of kernel invocations on a GPU of a group before rotating to the next group
Note: originally there was a random_seed_generator proposed in the linked ticket, that was not implemented since there are very few instances where you would want the selection of the groups to be randomly generated (and if you do, you can randomly generate the pattern and place it as a large list of groups in pmc_group).
All existing counter functionality should be preserved (selection of counters on specific devices only, profiling of only specific kernels, etc).
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: aa88dd44c7]
* [SWDEV-518071] Return HSA not loaded status (device counter collection)
This is a state that a caller would want to know about to understand if
they got no counters because of a failure or if they were trying to
collect counters too early (as is the case in the sample, which can
attempt to collect counters before HSA is inited).
* Minor edit
* format
* [SWDEV-518081] Simplify Metric Loading (#243)
* [SWDEV-518071] Return HSA not loaded status (device counter collection)
This is a state that a caller would want to know about to understand if
they got no counters because of a failure or if they were trying to
collect counters too early (as is the case in the sample, which can
attempt to collect counters before HSA is inited).
* [SWDEV-518324] Add AST update support
Allows the ability for ASTs to be updated (instead of an unchangable
static value). Adds a shared pointer return type to protect against
static destructors/modifications from invalidating potentially in use
AST definitions. No functionality/use changes in this PR.
* [SWDEV-518593] Add updatable dimension cache + fix string issues (#252)
* [SWDEV-518593] Add updatable dimension cache + fix string issues
Updates dimension cache to use the same design pattern as AST/Metrics.
Fixes the string scoping issue seen in ASTs, which appears here as well.
* Add rocprofiler_create_counter
Creates derived counters based on input from the API. This PR does three
things:
1. Adds the API + test case
2. Validates that an AST can be constructed from the counter supplied.
3. Updates metrics, ast, and dimension caches to include the new metric.
Metric should be available for use immediately after the call completes.
Due to the regeneration of ASTs, this call should not be performed in
performance sensitive code.
* Suggestion fixes
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Minor tweak
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
* Fixes for comments
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
[ROCm/rocprofiler-sdk commit: 007285272b]
* Perfetto duration temp fix setup
* Add timestamp change amounts to ROCP Info
* Groups kernel dispatch info by agent and queue id before sorting. Midpoint interpolation is then performed on the sorted kernels
* Moved dispatch bins into the for-loop
* Fix compilation error by using const ref
* Modified for review comments
* Changed variable names
[ROCm/rocprofiler-sdk commit: 6518c5463d]
* Initial fix for runtime error in id_decode.hpp:set_dim_in_rec()
* actual fix: corrected the handling of case where dim==1 (ROCPROFILER_DIMENSION_NONE)
* removing magic numbers
* minor fix
* fix for invalid bool value at runtime
* clang format
* build fix
---------
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: cffda33d3c]
* Fix segfault on fail to query GPU name
* Format
* Review comments
* Format
* Review comment
---------
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
[ROCm/rocprofiler-sdk commit: 346c7149dd]
* rocDecode API Tracing support
* Test bin file added to rocdecode. Need to add validate python methods
* Added option to not make rocDecode tests
* Added rocdecode and rocprofv3 tests
* Added csv test
* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI
* Add option to avoid building rocdecode tests
* Added option to avoid building rocdecode bin file
* Support for rocJPEG API Trace
* Added newline to rocjpeg_version.h
* json-tool code added, initial test/bin commit
* Formatting
* Resolved rocjpeg bin test compilation errors
* Tests implemented. Perfetto module currently resulting in errors, so need to retest whenever it is fixed
* Formatting and compilation errors
* Minor fixes
* Copyright year update and minor fixes
* Doc update fix
* Added rocjpeg csv file in data
* Addresses review comments: Updated fixed Findroc.. and uses root directory as a hint, fixed documentation error, changed tables to use _CORE, minor style fixes
* Added rocdecode and rocjpeg to CI
* Removed rocdecode and rocjpeg from CI and added back build tests option
* Updated Cmake Files
* Added rocDecode and rocJPEG to CI
* Remove cmake line added in error
* Temporarily modified tests to pass if rocdecode or rocjpeg tracing are not supported for CI, cmake changes
* Added find_package for test
* Added back use of system rocDecode and rocJPEG, modifies system files to include prefix path
* Updated no-link to include INCLUDE_DIR/roc(decode|jpeg), added comments for tests
* Resolve merge conflicts and formatting
* Added regex find and replace instead of include for CI
* VAAPI package causing errors on Vega20
* Removed system rocjpeg and rocdecode use temporarily until cmake issues resolved
* Removed workflows regex
* Formatting and minor test modification
* Modified test for vega20
* Update rocDecode and rocJPEG cmake and tests
* Changelog
* Fix merge conflict
* Added back if-statements around add-tests since cmake-generator-expressions are resulting in errors when the packages are missing
* Removed if found statements, replaced with TARGET:EXISTS
* Skip json file for rocjpeg and rocdecode tests if not supported
* Add os import
---------
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 31fe8858d1]
* Counter track for memory allocation is now a running sum showing total allocation
* Address review comments
* Update source/lib/output/generatePerfetto.cpp
Co-authored-by: Meserve, Mark <Mark.Meserve@amd.com>
* Updated to reflect review comments
* Fix compilation errors on CI
* remove braces on scalar
* Fix struct compilation issues
* Removed name_to_id for sanitizer
---------
Co-authored-by: Meserve, Mark <Mark.Meserve@amd.com>
[ROCm/rocprofiler-sdk commit: cc0c401615]
* rocprofv3: suppress agent info when no data collected
* Update output config serialization
- full serialization of output configuration
* Update rocprofiler-sdk-att/tests
- add version and soversion
- change output directory
- generate libatt_decoder_summary
- disable tests instead of removing them
* Update rocprofv3 command-line
- make --att-library-path hidden by default
- simplify check_att_capability
- reorder pc sampling options
- add hidden --echo option
- remove ROCPROF_LIST_AVAIL_TOOL_LIBRARY from preload
* Add new rocprofv3 tests for specify the ATT library path
* Tweak to rocprofv3-test-hsa-multiqueue-att tests
* Update rocprofv3 tool to enable output with att
* Fix standalone test installation
* Revert to fetchcontent_makeavailable to fetchcontent_populate
* Revert tests/common/CMakeLists.txt
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 59b41ab5aa]
* [DO NOT MERGE] Misc UUID updates
- this is WIP
* Agent visibility
- Support for ROCR_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, CUDA_VISIBLE_DEVICES, GPU_DEVICE_ORDINAL
* Update CHANGELOG
* tweak to rocprofiler_agent_runtime_visiblity_t
* Code object kernel address
- new fields in code_object_kernel_symbol_register_data_t
- kernel_code_entry_byte_offset
- kernel_address
* Support ROCR_VISIBLE_DEVICES reordering devices for HIP
* Addressed code review changes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 6246ec4040]
* rocprofv3: do not abort if counter does not have dimensions
* Relax error handling further in rocprofv3 metadata
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 3071199386]
* Force HSA_AMD_MEMORY_POOL_EXECUTABLE_FLAG value to be used with HSA calls
Fix for CI
* More tweaks
* Increase reproducible-runtime kernel sleep granularity
* Fix data race in synchronous device counter collection sample
* Update device counting service
- add get_active_context function
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 080b2ba451]
* Add regex for undefined behavior to ROCPROFILER_DEFAULT_FAIL_REGEX
- add UBSAN_OPTIONS to setup-sanitizer-env.sh
* Improve ROCPROFILER_DEFAULT_FAIL_REGEX
* Use -fno-sanitize-recover=undefined flag
- this compiler flag causes all undefined behavior errors to exit
* Revert ROCPROFILER_DEFAULT_FAIL_REGEX
* fix for shift overflow
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Manjunath-Jakaraddi <manjunath.jakaraddi@amd.com>
[ROCm/rocprofiler-sdk commit: e743bf5a93]
* SDK: No bg thread if no clients use SDK
* Update CHANGELOG
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 0fbe6cc7b6]
* Adding support for hsa_amd_signal_wait_all
* Fixes for HIP
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
[ROCm/rocprofiler-sdk commit: 02a519e84e]
* Adding New HIP APIs
* Format Fix
* Format Fix
* Removing changes from ostream and moving it to format
* Addressing Code Review Comments
* Versioning the new hip calls formatting
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
[ROCm/rocprofiler-sdk commit: dd5c0ea257]
* [SWDEV-509876] Remove buffer requirement from device counting service
No longer require a buffer to be given when setting up device counting
service. This is to reduce performance overhead in cases where immediate
return of counting samples is being used (synchronous mode).
* Missed file
* Update source/include/rocprofiler-sdk/device_counting_service.h
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Update source/lib/rocprofiler-sdk/counters/controller.cpp
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Update source/lib/rocprofiler-sdk/counters/device_counting.cpp
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Fixes for build
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: 0c4a56c6bb]
Fix HSA_AMD_MEMORY_POOL_EXECUTABLE_FLAG for ROCm < 6.4
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 72a27feb04]