* Update HIP VERSION to 7.0.0, HIP_RUNTIME_API_TABLE_STEP_VERSION to 13
* Changed rocprofiler_config_interfaces.cmake to check version after finding package
* Update cmake/rocprofiler_config_interfaces.cmake
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Updated kernel rename service to use internal correaltion IDs for external correlation IDs and kernel rename values
* Updated for review comments
* Changed if-condition in generateJSON.cpp to check if string view is empty before calling get_entry
* [rocprofv3] Add rocpd output support (part 1: prelude)
- git submodules for sqlite3, GOTCHA, and pybind11
- HIP stream data
- rocprofiler_query_intercept_table_name(...)
- serialization load
- rocprofiler::sdk::get_perfetto_category(KindT)
- rocprofiler::sdk::parse::strip
- common library updates
- md5sum
- hasher
- simple_timer
- static_tl_object
- get_process_start_time_ns(pid_t)
- output library updates
- node_info
- file_generator (generator is now virtual base class)
- stream info updates
* Added submodules
* Code review updates
* Minor unused-but-set-X warning fixes
* Update CI
- install libsqlite3-dev package
* Update CI
- install libsqlite3-dev package
* Fix static thread-local object memory leak
- also fix signal handler chaining
* Remove URL from comment
* Remove page migration exception
* Enable ROCPROFILER_BUILD_SQLITE3 by default
- try find_package(SQLite3) first and then build when ROCPROFILER_BUILD_SQLITE3=ON
* Fix gotcha installation
- make install of target optional
* Validate tracing + counter collection dispatch data
- i.e. correlation ids, thread ids, timestamps
* Make find_package(SQLite3) optional
- ROCm CI does not have SQLite3 dev package installed and cannot build from source (missing tclsh)
* Fixes to tracing + counter collection test
* get_process_start_time_ns update
- original implementation did not work
* Fix pytest-packages test_perfetto_data for counter collection
- erroneous failure when used with same PMC + multiple agents
* cmake policy: option() honors normal variables
- for GOTCHA submodule
* Improve samples/api_buffered_tracing stability
- reduce likelihood of sporadic exception throw
* Update gotcha submodule
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Add trace decoder to API.
* Cleanup and activity
* Rename
* Minor fix
* Replace tt/TT with thread_trace/THREAD_TRACE
- public API types are not abbreviated
* Fix aliases
* Build system updates
- activate clang-tidy for all subfolders in lib
- fix addition of sources for att-tool
* Fix clang-tidy issues with lib/att-tool/counters.{hpp,cpp}
* Delete counters.cpp
* Formatting
---------
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Convert YAML Format
Convert YAML format and reader to properly read the YAML.
Comparison between output's from the YAML show only changes in ordering
of architectures (and ids).
* Test fixes
* Add script for converting the YAML schema to source/scripts
* Update documentation
* Change the extra counter code block to YAML
* Add missing new line at EOF
* remove name issues
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Fix double buffer data race
- fixes relatively rare data race in double buffering scheme
In `rocprofiler::buffer::instance::emplace`, the `container::record_header_buffer::get_record_headers()` function returned a `std::vector<rocprofiler_record_header_t*>` and then invoked callback to tool. It was possible for that callback to still be executing while the buffer was being updated. This potentially introduced a scenario where the rocprofiler_record_header_t* was modified (or corrupted) before the tool processed the record. In rocprofv3, this would result in a "future" buffer record showing up among "past" buffer records. E.g., correlation id sequence of 1-15 where the buffer flushes after five values, could result in this during processing:
| | | | | |
|:---:|:---:|:---:|:---:|:---:|
| 1 | 2 | 3 | 4 | 15 |
| 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 |
Because buffer A (of double buffering scheme) originally containing corr ids 1-5 stalled after process corr id 4 (e.g. write to disk), buffer B filled up with 6-10 and started flushing, causing a switch back to buffer A, and buffer A was filled with 11-15 by the time callback accessed what was originally corr id 5 but was now updated to corr id 15.
* Update CHANGELOG
* misc minor cleanup
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Adding Benchmarking Stg1
* config fix
* reset
* add jpeg and decode traces in iteration
* address comments benchmark config files.
* address comments.
* address comments.
* address comments: revert cntrl ctx.
* address comments: revert csv output.
* resolve merge conflits.
* format.
* build fix.
* fix hip runtime api traces.
* loop cb services.
* format.
* bug fix.
* Fix operator>
- public C++ comparison operator
* Update configuration options
- support selected regions (--selected-regions)
- support writing output config json (--output-config)
- update serialization data
* rocprofv3 tool library misc updates
- lambda for starting context
- support for writing config json
* Tool library updates
- Finished support for all benchmarking modes
- Added build spec support to config json
* Fix ROCPROFILER_SOVERSION
- this value should not be multiplied by 10,000
* Minor tweak to rocprofv3
* Benchmarking scripts
* formatting
* Fix duplicate include
* Add reproducible-dispatch-count test app
- used in benchmarking
* registration logging
- report number of registered contexts and active contexts after client initialization
* Serialize environment in rocprofv3 output config
* ROCPROFILER_BUILD_BENCHMARK CMake option
* Update benchmark SQL schema
- hash_id is text
- add md5sum to benchmarked_app
- remove app_id from benchmarked_sdk
- add sdk_id to benchmark_config
- separate hip_trace into hip_runtime_trace and hip_compiler_trace
- use INT instead of INTEGER for MySQL compatibility
- add count column in benchmark_statistics
- allow std_dev to be NULL in benchmark_statistics
* Update rocprofv3-benchmark.py
- use md5 instead of python hash (which includes random seed)
- use args.mysql_database
- compute md5sum of executable
- fix insert_benchmark_config
- marker trace fixes
- memory allocation fixes
- split hip_trace into hip_{runtime,compiler}_trace
- remove app_id from benchmarked_sdk
- support warmup runs
- count field in benchmark_statistics
* Support launcher and environment in YAML
* Update reproducible-dispatch-count.cpp
- support mode which doesn't use hip event timing
* Misc rocprofv3-benchmark.py updates
- fix some MySQL support
- remove some unnecessary logging
* support mysql db.
* Format.
* Updated SQL input files
- moved benchmark_schema.sql to benchmark_table.sql
- added benchmark_views.sql
- uses {{metric}} syntax for variable substitution
* cmake formatting
* update rocprofv3-benchmark.py
- benchmark config labels
- overhead views
* Encode rocprofv3-benchmark PID in rocprofv3 and timem output files
* Minor tweak to benchmark_views.sql
- include count
- reorder fields for readability
* split statements and use IS if values is NONE.
* use backtick instead of double quotes and add IS before NOT NULL.:
* Adding Mandelbrot Benchmark App
* Adding Dockerfile example
* Update dockerfile
* Update dockerfile
* [SDK] rocprofiler_query_external_correlation_id_request_kind_name
* Execution-profile benchmark mode
* Execution profile SQL support
* Rename mandlebrot folder + misc clang-tidy
* [rocprofv3-benchmark] Execution profile support
* Update installation
* add work dir when setting git revision, useful when building outside src.
* Set FULL_VERSION_STRING and ROCPROFILER_SDK_GIT_REVISION
- when benchmark folder is top-level
* Remove unused python packages from requirements.txt
* Use ldd/pyelftools to include linked libs for md5sum
- also add --filter-benchmark and --filter-rocprofv3 options
- support labeling the rocprofv3 options
- use more argparse groups
- more generic application of filters
- support variable substitution in environment, e.g. PATH=/some/path:$PATH
* Environment improvements
- improve reproducibility when env set via input file vs. shell
- support "environment-ignore" to remove environment variables
* Misc formatting
* Misc. fix
* use backticks for defining new columns name
* Support shuffling the order of benchmark modes/rocprofv3 args
* Address review comments
* Update Dockerfile
- rename to Dockerfile
- reduce to one layer
* Support docker build arg BRANCH
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Fix stream duplication and fixed tests
* Added comments to explain stream.cpp code, change stream nullptr check to occur in update table to prevent readding null stream, simplified hip-streams bin file code, add destroyStreams to hip-streams bin file code
* Removed roctx from CMakeLists.txt
* Updated documentation
* Fix documentation
* Removed update_table for HIP compiler table and updated stream.cpp to remove support for HIP compiler table
* Added runtime initialization check for HIP
* Changed tool name, working on fixing memory management
* Added context for counter collection kernel rename combination
* Changed name from map to set and changed description
* Fix documentation description for group-by-queue
* Merged memory copy and kernel operations onto a single track when on the same stream
* Updated perfetto output to remove hardware information from track name to merge all memory copy and kernel operations on the same stream to the same track:
* Most pr comments addressed
* Added filter for counter collection and removed kernel buffer tracing hack
* Added PR comment fixes
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Fix code coverage workflow
* Relocate rocprofv3 conversion test script + rename tests
- these are rocprofv3 tests and were not properly located and not properly named
* Fix thread sanitizer
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* [rocprofv3] Use -P for collection period option
- Reserve -p for profiler attachment
* Update changelog
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Fix compilation for output library
- link to targets for ATT (amd-comgr, dw, elf)
* Relax correlation ID retirement log failures
- only fail for correlation ID retirement underflow when building in CI mode
* Fix shebang for several files
- license was inserted before shebang in several places
* Update code coverage exclude folders for samples
* Tweak to agent tests
- test to make sure hsa agent is not the old value instead of testing that it is the new value
* Fix libdw include/link
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* add error log when accumulate is used on counters not from sq block.
* Address comments.
---------
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
* SWDEV-515895: Fix for hang in hsa_barrier
- Root cause of hang is enqueue_packet using a completed barrier
- This occurs because set_barrier does not correctly mark the barrier
as complete.
- Fixed by consolidating duplicate code for how a barrier is:
- Marked as complete
- Added common function and trace message
- Checked as complete
- Removed internal completion tracking variable in favor of signal
- Added some log messages which were useful in tracking this hang
- Destroying an hsa_barrier now calls the provided completion callback
* formatting
* Fix hsa_barrier test
- Removes assumption that enqueue_packet always generates packets
- Add assertion that a complete barrier does not generate packets
* formatting
* Address review comments
* rocprofv3: LD_PRELOAD for signal and sigaction
- wrappers around `signal` and `sigaction` to prevent applications which install signal handlers to replace the rocprofv3 signal handlers
- minor tweaks to buffer sizes (use page_size instead of
KiB)
* [DO NOT COMMIT] extra logging
* Switch git submodule url for perfetto
- use GitHub URL as this is more accessible
* Update ring_buffer<Tp>
- account for alignment padding
* Update buffered_output
- track number of bytes stored
- add nullptr checks
* Update tmp_file_buffer
- track number of bytes
- read_tmp_file does not create tmp file if it does not already exist
* Update tmp_file
- add exists member function for checking whether temporary file already exists
- tweak remove() implementation
* Update config.hpp
- add option to enable/disable signal handlers
- add option for minimum_output_bytes
* Make signal, sigaction functions visible
* rocprofv3 tool updates
- chained signals
- override the signal handler(s) installed by the application
- improve cleanup of temporary files
- support minimum output bytes
* Add commandline support
* fixing test
* minor fix
* minor fix
* fix clang issue
* fix
* Adding docs
* review comments
* review changes
* review
* YUV pulldown additions to rocdecode
* More rocdecode changes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* MI300 Stochastic PC Sampling Documentation
* Stochastic PC sampling title renaming
---------
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
* Update header file includes
* Fix includes for lib/rocprofiler-sdk/hip/hip.hpp
* Minor touch ups
* Minor include improvements
* Doxygen tweak
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Added buffer tracing support for rocdecode and updated tests to work with buffer tracing
* Updated perfetto to output args individually rather than as a string list
* Updated docstrings and operation type, changed OTF2 code to remove warning due to change in operation type
* Updated tests for review comments
* Test args exist and return value
* Updated to use string entry
* Change function name
* Updated PR to reflect review comments
* Updated for PR review comments
* Change function name