* Using semaphore to sync with all peer processes in finalization stage
[rocprofv3] Implement synchronization using POSIX semaphore in finalization
* clang format code
* clang 11 format code
* Add process sync option for rocprofv3
* Default value of process sync is false
* Update source/lib/rocprofiler-sdk-tool/tool.cpp
Apply suggestion by Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* update according to comments
* add new line to helper.hpp
---------
Co-authored-by: Huanran Wang <huanrwan@amd.com>
Co-authored-by: Huanran Wang <huanran.wang@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* rocprofv3: LD_PRELOAD for signal and sigaction
- wrappers around `signal` and `sigaction` to prevent applications which install signal handlers to replace the rocprofv3 signal handlers
- minor tweaks to buffer sizes (use page_size instead of
KiB)
* [DO NOT COMMIT] extra logging
* Switch git submodule url for perfetto
- use GitHub URL as this is more accessible
* Update ring_buffer<Tp>
- account for alignment padding
* Update buffered_output
- track number of bytes stored
- add nullptr checks
* Update tmp_file_buffer
- track number of bytes
- read_tmp_file does not create tmp file if it does not already exist
* Update tmp_file
- add exists member function for checking whether temporary file already exists
- tweak remove() implementation
* Update config.hpp
- add option to enable/disable signal handlers
- add option for minimum_output_bytes
* Make signal, sigaction functions visible
* rocprofv3 tool updates
- chained signals
- override the signal handler(s) installed by the application
- improve cleanup of temporary files
- support minimum output bytes
* Add commandline support
* fixing test
* minor fix
* minor fix
* fix clang issue
* fix
* Adding docs
* review comments
* review changes
* review
* YUV pulldown additions to rocdecode
* More rocdecode changes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: 87badfbd15]
Adds iteration based multiplexing to counter collection. Counter groups can now be specified. These counter groups are collected on a device individually until a specified interval period is reached. When the interval is reached, the next counter group is set to be collected on subsequent kernel executions.
Supplies two new argument types that can be included in YAML/JSON inputs:
pmc_groups: an array of arrays containing the counter groups to run (i.e. [ ["SQ_WAVES", "GRBM_COUNT"], ["GRBM_GUI_ACTIVE"])
pmc_group_interval: the number of kernel invocations on a GPU of a group before rotating to the next group
Note: originally there was a random_seed_generator proposed in the linked ticket, that was not implemented since there are very few instances where you would want the selection of the groups to be randomly generated (and if you do, you can randomly generate the pattern and place it as a large list of groups in pmc_group).
All existing counter functionality should be preserved (selection of counters on specific devices only, profiling of only specific kernels, etc).
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: aa88dd44c7]
* rocDecode API Tracing support
* Test bin file added to rocdecode. Need to add validate python methods
* Added option to not make rocDecode tests
* Added rocdecode and rocprofv3 tests
* Added csv test
* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI
* Add option to avoid building rocdecode tests
* Added option to avoid building rocdecode bin file
* Support for rocJPEG API Trace
* Added newline to rocjpeg_version.h
* json-tool code added, initial test/bin commit
* Formatting
* Resolved rocjpeg bin test compilation errors
* Tests implemented. Perfetto module currently resulting in errors, so need to retest whenever it is fixed
* Formatting and compilation errors
* Minor fixes
* Copyright year update and minor fixes
* Doc update fix
* Added rocjpeg csv file in data
* Addresses review comments: Updated fixed Findroc.. and uses root directory as a hint, fixed documentation error, changed tables to use _CORE, minor style fixes
* Added rocdecode and rocjpeg to CI
* Removed rocdecode and rocjpeg from CI and added back build tests option
* Updated Cmake Files
* Added rocDecode and rocJPEG to CI
* Remove cmake line added in error
* Temporarily modified tests to pass if rocdecode or rocjpeg tracing are not supported for CI, cmake changes
* Added find_package for test
* Added back use of system rocDecode and rocJPEG, modifies system files to include prefix path
* Updated no-link to include INCLUDE_DIR/roc(decode|jpeg), added comments for tests
* Resolve merge conflicts and formatting
* Added regex find and replace instead of include for CI
* VAAPI package causing errors on Vega20
* Removed system rocjpeg and rocdecode use temporarily until cmake issues resolved
* Removed workflows regex
* Formatting and minor test modification
* Modified test for vega20
* Update rocDecode and rocJPEG cmake and tests
* Changelog
* Fix merge conflict
* Added back if-statements around add-tests since cmake-generator-expressions are resulting in errors when the packages are missing
* Removed if found statements, replaced with TARGET:EXISTS
* Skip json file for rocjpeg and rocdecode tests if not supported
* Add os import
---------
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 31fe8858d1]
* rocDecode API Tracing support
* Test bin file added to rocdecode. Need to add validate python methods
* Added option to not make rocDecode tests
* Added rocdecode and rocprofv3 tests
* Added csv test
* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI
* Add option to avoid building rocdecode tests
* Added option to avoid building rocdecode bin file
* Merge conflict error
* CMake files changed in response to review comments. Attempting to implement callbacks.
* Turned off test building for rocdecode
* Minor fixes for review comments
* Review comments
* Updated formatting
* Document changes and format.hpp reversion. Need to remove iterate args support for now for later update.
* Remove iterate args support
* Remove iterate-args
* enforce abi versioning in macro if
* Fix doc error
* removed spaces to fix indentation error
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit: e307b89ca4]
* Adding tool pc sampling support
Fixing merge issue
tool support on SDKupdates
link amd-comgr
Sanitizer failure fix
fix format
Addressing review comments
misc fix
Adding dispatch id to the CSV output
AddingCHANGELOG
[ROCProfV3][PC Sampling] Initial ROCProfV3 PC sampling tests for JSON and CSV formats (#17)
ROCProfV3 initial tests for JSON and CSV output.
Simple kernels that simplify the verification of samples to instruction decoding
has been introduced.
removing option to enable pc sampling explicitly
Adding documentation
no pc-sampling option in tests anymore
Addressing review comments
Updating docs
an option for choosing whether all units must be sampled
try ignoring PC sampling tests (#36)
* run pc-sampling tests on MI2xx runners
* use v_fmac_f32 instead of s_nop 0 in tests
* fixing docs
[ROCm/rocprofiler-sdk commit: 50b185b9ac]
* Initial commit: Need to implement wrapper function to collect data and test that wrapper function is correctly replacing core HSA functions
* Attempted to implement wrapper implementation for hsa memory allocation functions. Need to modify generate record files and test if implementation is working as expected
* Debugging and implementing generateCSV function
* Memory allocation size and starting address outputted to csv and json file formats
* Formatting
* Initial setup for OTF2 and Perfetto generation
* Collecting agent id for memory_allocation and formatting
* Modified memory_allocation.cpp to set up code for AMD_EXT commands
* Support for memory_pool_allocate added
* Removed accidently added file
* Made flag optional and added more OTF2 and Perfetto code. Needs testing to ensure perfetto and OTF2 works
* Formatting
* Fixed perfetto and otf2 output
* Fixed flag issue due to incorrect buffer use
* Updated documentation
* Small cleaning and comments
* Added test for HSA memory allocation tracing
* Fixed summary test validation errors due to allocation tracing. Added type to location_base to create unique event ids for allocation due to OTF2 trace error
* Decreased lower limit of hip calls for test
* Modified summary tests to vary number of allocate requests
* Minor fixes to address comments. Still need to address OTF2 comments
* Fix docs and changed OTF2 to use enum for type specified in location_base construction
* Fixed schema error
* Added vmem command tracking. Need to add test
* Updated test to work with vmem command and updated generateCSV to output int instead of hex string.
* OTF2 enum update and mispelling fix
* CI does not support Virtual Memory API. Removed vmem test. Will add back if CI is modifed to suport vmem API
* Update CMakeLists.txt for memory allocation test
* Updated summary test
* Minor fixes to address comments
* Moved domain_type.hpp enum to before LAST
* Fixed compile errors and formatting
* Fixed stats summary domain name error
* Added rocprofv3 test
* Page migration test fix
* Undo page migration test changes. Failures do not appear to have to do with memory allocation
[ROCm/rocprofiler-sdk commit: 3bd7773cf7]