* rocDecode API Tracing support
* Test bin file added to rocdecode. Need to add validate python methods
* Added option to not make rocDecode tests
* Added rocdecode and rocprofv3 tests
* Added csv test
* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI
* Add option to avoid building rocdecode tests
* Added option to avoid building rocdecode bin file
* Merge conflict error
* CMake files changed in response to review comments. Attempting to implement callbacks.
* Turned off test building for rocdecode
* Minor fixes for review comments
* Review comments
* Updated formatting
* Document changes and format.hpp reversion. Need to remove iterate args support for now for later update.
* Remove iterate args support
* Remove iterate-args
* enforce abi versioning in macro if
* Fix doc error
* removed spaces to fix indentation error
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Misc AFAR VII updates + clang-tidy-19 + bump version to 0.6.0
- move tests/rocprofv3/trace-period to tests/rocprofv3/collection-period
- bump clang-tidy to v19
- fix misc clang-tidy errors
* Update the collection period test
- don't attach files on fail bc when test is disabled, it causes problems
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Adding changes to register and read symbols from the hip fat binary
* adding json output for host_functions
* added error handling
* adding json tool support
* Adding tests
* formatting changes
* Adding documentation
* refactoring as per amd-staging
* Adding intializers and changing macros
* Fix page-migration background thread on fork (#31)
* Fix page-migration background thread on fork
After falling off main in the forked child, all the children
try to join on on the parent's monitoring thread. This results
in a deadlock. Parent is waiting for the child to exit, but
the child is trying to join the parent's thread which is
signaled from the parent's static destructors.
Even with just one parent and child, due to copy-on-write
semantics, a child signalling the background thread to join
will still block (thread's updated state is not visible
in the child).
This fix creates background treads on fork per-child with a
pthread_atfork handler, ensuring that each child has its own
monitoring thread.
* Formatting fixes
* Detach page-migration background thread and update test timeout
* Attach files with ctest
* Update corr-id assert
* Tweak on-fork, simplify background thread
* Revert thread detach
* Adding --collection-period feature in rocprofv3 to match v1/v2 parity (#9)
* Adding Trace Period feature to rocprofv3
* Adding feature documentation
* Update source/bin/rocprofv3.py
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fixing format
* Moving to Collection Period and changing the input params
* Format Fixes
* Fixing rebasing issues
* Removing atomic include from the tool
* Adding more options for units, optimizing the code
* Fixing rocprofv3.py
* Fixing time conv & adding time controlled app
* Fixing format
* Changing to shared memory testing methodology
* use of shmem use
* Fix include headers for transpose-time-controlled.cpp
* Format upload-image-to-github.py
* Removing shmem and using only env var to dump timestamps from the tool
* Tool Fixes + Test Config
* Adding Tests
* Fixing Review comments
* Update trace period implementation
* Update trace period tests
* check between start and stop timestamps
* Merge Fix
* Update validate.py
* Improve safety of rocprofiler_stop_context after finalization
* Pass context id to collection_period_cntrl by value
* Adding 20 us error margin
* Ensure log level for collection-period test is not more than warning
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Update lib/rocprofiler-sdk/code_object/hip/code_object.*
- move error code check macros to implementation
- fix macros which check error code
- use constexpr values instead of #define
* Update lib/rocprofiler-sdk/code_object/hip/code_object.*
- debugging for error that cannot be locally reproduced
* Update lib/rocprofiler-sdk/code_object/hip/code_object.*
- improve error handling and logging
* Update lib/rocprofiler-sdk/code_object/hip/code_object.*
- tweak to non-fatal logging messages
* Update lib/rocprofiler-sdk/code_object/hip/code_object.*
- cleanup of logging messages
* Update host kernel symbol register data fields
* Update source/lib/rocprofiler-sdk/code_object/hip/code_object.hpp
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Kuricheti, Mythreya <Mythreya.Kuricheti@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Adding Trace Period feature to rocprofv3
* Adding feature documentation
* Update source/bin/rocprofv3.py
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fixing format
* Moving to Collection Period and changing the input params
* Format Fixes
* Fixing rebasing issues
* Removing atomic include from the tool
* Adding more options for units, optimizing the code
* Fixing rocprofv3.py
* Fixing time conv & adding time controlled app
* Fixing format
* Changing to shared memory testing methodology
* use of shmem use
* Fix include headers for transpose-time-controlled.cpp
* Format upload-image-to-github.py
* Removing shmem and using only env var to dump timestamps from the tool
* Tool Fixes + Test Config
* Adding Tests
* Fixing Review comments
* Update trace period implementation
* Update trace period tests
* check between start and stop timestamps
* Merge Fix
* Update validate.py
* Improve safety of rocprofiler_stop_context after finalization
* Pass context id to collection_period_cntrl by value
* Adding 20 us error margin
* Ensure log level for collection-period test is not more than warning
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Adding tool pc sampling support
Fixing merge issue
tool support on SDKupdates
link amd-comgr
Sanitizer failure fix
fix format
Addressing review comments
misc fix
Adding dispatch id to the CSV output
AddingCHANGELOG
[ROCProfV3][PC Sampling] Initial ROCProfV3 PC sampling tests for JSON and CSV formats (#17)
ROCProfV3 initial tests for JSON and CSV output.
Simple kernels that simplify the verification of samples to instruction decoding
has been introduced.
removing option to enable pc sampling explicitly
Adding documentation
no pc-sampling option in tests anymore
Addressing review comments
Updating docs
an option for choosing whether all units must be sampled
try ignoring PC sampling tests (#36)
* run pc-sampling tests on MI2xx runners
* use v_fmac_f32 instead of s_nop 0 in tests
* fixing docs
Adds rocprofiler_load_counter_definition. This function allows a counter definition file to be supplied to rocprofiler-sdk directly. Takes in a string containing the counter definition YAML, its size (in bytes), and a flag value to state whether this is an append operation or not.
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: usrihari123 <srihari.u@amd.com>
* Initial commit: Need to implement wrapper function to collect data and test that wrapper function is correctly replacing core HSA functions
* Attempted to implement wrapper implementation for hsa memory allocation functions. Need to modify generate record files and test if implementation is working as expected
* Debugging and implementing generateCSV function
* Memory allocation size and starting address outputted to csv and json file formats
* Formatting
* Initial setup for OTF2 and Perfetto generation
* Collecting agent id for memory_allocation and formatting
* Modified memory_allocation.cpp to set up code for AMD_EXT commands
* Support for memory_pool_allocate added
* Removed accidently added file
* Made flag optional and added more OTF2 and Perfetto code. Needs testing to ensure perfetto and OTF2 works
* Formatting
* Fixed perfetto and otf2 output
* Fixed flag issue due to incorrect buffer use
* Updated documentation
* Small cleaning and comments
* Added test for HSA memory allocation tracing
* Fixed summary test validation errors due to allocation tracing. Added type to location_base to create unique event ids for allocation due to OTF2 trace error
* Decreased lower limit of hip calls for test
* Modified summary tests to vary number of allocate requests
* Minor fixes to address comments. Still need to address OTF2 comments
* Fix docs and changed OTF2 to use enum for type specified in location_base construction
* Fixed schema error
* Added vmem command tracking. Need to add test
* Updated test to work with vmem command and updated generateCSV to output int instead of hex string.
* OTF2 enum update and mispelling fix
* CI does not support Virtual Memory API. Removed vmem test. Will add back if CI is modifed to suport vmem API
* Update CMakeLists.txt for memory allocation test
* Updated summary test
* Minor fixes to address comments
* Moved domain_type.hpp enum to before LAST
* Fixed compile errors and formatting
* Fixed stats summary domain name error
* Added rocprofv3 test
* Page migration test fix
* Undo page migration test changes. Failures do not appear to have to do with memory allocation
* Add rocprofv3-multi-node.md to source/lib/rocprofiler-sdk-tool
* Initial source re-organization
- create "output" static library
* Update include/rocprofiler-sdk/cxx/serialization.hpp
- add GPR count fields to kernel symbol serialization
* Add source/scripts/generate-rocpd.py
- reads one or more JSON output files from rocprofv3 and writes rocpd SQLite3 database
- Note: preliminary implementation
* More reorganization b/t lib/rocprofiler-sdk-tool and lib/output
* Updates to generate-rocpd.py
- add SQL views
- option: --absolute-timestamps -> --normalize-timestamps
- option: --generic-markers
- misc fixes with regards to getting the views working
- support marker names
* Update generate-rocpd.py
- Add --marker-mode option
* Update generate-rocpd.py
- Improve debugging of bad bulk SQLite statements
* Update rocprofv3-multi-node.md
- cleanup of proposed SQL schema
* lib/output/format_path.{hpp,cpp}
- rename format to format_path (in config.hpp and config.cpp)
- move format_path functionality to format_path.{hpp,cpp}
* Rework lib/output/tmp_file_buffer.{hpp,cpp}
* Update output_key.cpp
- support %cwd%, %launch_date%
* Rework lib/output/buffered_output.hpp
* Support csv_output_file constructed via domain_type
* Update lib/output/domain_type.{hpp,cpp}
- get_domain_trace_file_name
- get_domain_stats_file_name
* Update lib/rocprofiler-sdk-tool/tool.cpp
- tweak headers
* Update lib/output/generate*.cpp
- remove include of helpers.hpp
- CSV uses domain_type for filenames
* Update samples/counter_collection/per_dev_serialization.cpp
- make wait_on volatile
* Remove tool_table from lib/output and lib/rocprofiler-sdk-tool
- Also split various structs into their own files
- lib/output/agent_info
- lib/output/metadata
- lib/output/kernel_symbol_info
- lib/output/counter_info
- Implemented rocprofiler::tool::metadata
* Optimize rocprofiler_tool_counter_collection_record_t
- reduce the size of the struct from 24784 bytes to 8376 bytes
* Introduced output_config
- split subset of config (from tools library) into output_config to be able to configure the output generating functions separately from the tool library
- this is a significant step towards the output generating functions not relying on static global memory
* Stream chunks of data into output instead of loading all info memory
* Remove duplicate group_segment_size in rocprofiler_kernel_dispatch_info_t serialization
* Adding Q&A to rocprofv3-multi-node.md
* Remove all remaining include lib/rocprofiler-sdk-tool from lib/output
- migrated a fair amount of code from lib/rocprofiler-sdk-tool/helper.hpp to lib/output
* Update Q&A of rocprofv3-multi-node.md
* Fix minor compilation errors + minor cleanup
* Update hsa/async_copy.cpp
- when ROCPROFILER_CI_STRICT_TIMESTAMPS > 0, reduce the active_signal sync wait time
* Update profiling_time.hpp
- fix log messages for when start/end time is less/greater than enqueue/current CPU time
* Fix generate_stats for tool_counter_record_t
* Dictionary optimization for generate-rocpd.py
---------
Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>
* Relax timestamp checking
- Prevent recurring CI failures that have no remedy until HSA/driver issues are resolved
* Replace "cc" abbreviation in tests with "counter-collection"
* Update CODEOWNERS to explicitly include jrmadsen for source/include
* Extra logging in rocprofiler tool library
* Tweak aborted-app test
- remove counter collection as part of the test
* Check to force tool to initialize the ctx id to zero.
* initialize rocprofiler_context_id_t with 0 in units tests
* changelog
---------
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
* Renamed agent profiling service to device counting service
Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:
find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +
* Converted dispatch profile to dispatch counting service
* Debug for functioal counters test
* Minor changes for CI
* Minor fix
* More fixes for CI
* Update evaluate_ast.cpp
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
* [Draft]: Add support for RCCL tracing
Address comments
* [Draft]: Add support for RCCL tracing
Address PR comments, changes from RCCL upstream
* Add RCCL library table registration
Working on adding support to rocprofiler-register
* Support compilation w/o <rccl/amd_detail/api_trace.h>
- dummy api_trace.h header
- return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED when RCCL does not have api_trace.h header
* RCCL API tracing tool support
- add to rocprofv3
- add to json-tool
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* LD_PRELOAD librocprofiler-sdk-roctx.so when marker-trace enabled
- this enables apps to link against old ROCTx (libroctx64.so) but get marker tracing in rocprofv3
* Update CHANGELOG
* Validation test for app linked to old (roctracer) ROCTx library
* Tweak scope of tool_counter_info
- causing "signal-unsafe call inside of a signal" error for ThreadSanitizer on mi200
* Fix handling of missing transpose-roctracer-roctx
* Disable rocprofv3 aborted-app test (ThreadSanitizer)
- ThreadSanitizer + mi200/mi300 + aborted-app results in a signal-unsafe call inside a signal that cannot be specifically suppressed as usual via rocprofv3_error_signal_handler for some unknown reason
* Add UndefinedBehaviorSanitizer job
* Move include/rocprofiler-sdk/cxx/details/delimit.hpp to tokenize.hpp
* Update docs/how-to/using-rocprofv3.rst
- fix code block indents
- reorder rocprofv3 options, limit them to important options
- add docs for `--runtime-trace`
* Update rocprofv3.py
- parser argument groups
- new `--runtime-trace` option
- new `--summary` option
- new `--summary-per-domain` option
- new `--summary-groups` option
- new `--summary-output-file` option
- new `--summary-units` option
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- fix async copy operation names: add "MEMORY_COPY_" prefix
* lib/rocprofiler-sdk-tool: update statistics.{hpp,cpp}
- statistics<>::get_percent function
- stats_entry_t struct
- stats_formatter struct
- percentage struct
- std::to_string(::rocprofiler::tool::percentage)
* lib/rocprofiler-sdk-tool: update domain_type.{hpp,cpp}
- reorder domain_type enum values
* lib/rocprofiler-sdk-tool: update generateCSV.{hpp,cpp}
- separate writing CSV from accumulating statistics
- a lot of functionality was moved to statistics.{hpp,cpp}
* lib/rocprofiler-sdk-tool: update output_file.{hpp,cpp}
- output_stream_t struct
- get_output_stream(...) returns output_stream_t instance
* lib/rocprofiler-sdk-tool: update generateJSON.cpp
- update get_output_stream usage to output_stream_t
* lib/rocprofiler-sdk-tool: update generateOTF2.cpp
- header include order tweak
* lib/rocprofiler-sdk-tool: update buffered_output.hpp
- stats_data_t was renamed to stats_entry_t
* lib/rocprofiler-sdk-tool: update generatePerfetto.cpp
- header include tweak
* lib/rocprofiler-sdk-tool: update tmp_file_buffer.hpp
- emit warning message if write_ring_buffer fails after offloading instead of aborting
- prefer placement new instead of assignment in write_ring_buffer
* lib/rocprofiler-sdk-tool: add generateStats.{hpp,cpp}
- functions for accumulating statistics
* Update tests/rocprofv3/tracing-hip-in-libraries/CMakeLists.txt
- accommodate tweak to CSV output file name for HIP and HSA traces
* lib/rocprofiler-sdk-tool: update config.{hpp,cpp}
- new config variables
- stats_summary
- stats_summary_per_domain
- summary_output
- stats_summary_unit_value
- stats_summary_unit
- stats_summary_file
- stats_summary_groups
- support output keys for hostname: %hostname% / %h
* lib/rocprofiler-sdk-tool: update tool.cpp
- support summary output
* Documentation fixes
* Test for summary output
* Update tests/bin/transpose to use more ROCTx
- also support building with the roctracer ROCTx
* Remove roctxMark from OTF2 + fix kernel-rename tests
- following more ROCTx calls in transpose, kernel-rename validation had to be updated
* JSON metadata + JSON summary
- add serialization support for config
- add serialization support for statistics
- additions to json spec
- rocprofiler-sdk-tool/metadata/config
- rocprofiler-sdk-tool/metadata/command
- rocprofiler-sdk-tool/summary
- config output_keys support for NVIDIA %q{<ENV-VAR>} syntax
- config output_keys support keys within keys
* rocprofv3 --summary-groups warning if no domain matches
- emit warning if a regex in for summary groups did not match any domain names
* Compile fix for lib/rocprofiler-sdk-tool/tool.cpp
- get_config().scratch_memory_trace
- pass contributions to write_json
* Update rocprofv3.py to preload rocprofiler-sdk-roctx
- appended to LD_PRELOAD when args.marker_trace is enabled
* Fix ReST link errors about subtitle underline being too short
* Patch tokenization of config::stats_summary_groups
- guard against array values of empty strings
* Tweak rocprofv3 summary test
- input-summary.yaml (used by rocprofv3-test-summary-inp-yaml-execute) only provides one summary group regex
* Disable LD_PRELOAD of librocprofiler-sdk-roctx.so
- this causes problems in the sanitizers, will be addressed in another PR
* Perfetto submodule
* include/rocprofiler-sdk/cxx/perfetto.hpp
- adapted from tests/common/perfetto.hpp
- updated json-tool to use <rocprofiler-sdk/cxx/perfetto.hpp>
* Update include/rocprofiler-sdk/cxx
- add details/delimit.hpp
- add details/join.hpp
- extend details/mpl.hpp
- extend details/operators.hpp
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- update MEMORY_COPY direction names
* Preliminary perfetto support
* Update lib/rocprofiler-sdk-tool/generatePerfetto.cpp
- fix getting roctx msg vs. buffer operation name
* Temporary variable restructuring
* Perfetto patches after rebasing onto main
* Revert lib/rocprofiler-sdk/hsa/async_copy.cpp
- revert name
* Update lib/rocprofiler-sdk-tool/generatePerfetto.cpp
- fix ReadTrace
* Update tests/bin/hip-in-libraries
- sleep_for
* Support PFTRACE output format option in rocprofv3
* Change perfetto logging
* Update rocprofv3 tests to generate pftrace output
* Minor tweak to json-tool.cpp
* Update requirements.txt for perfetto testing
* Fix data race on amount_read in generatePerfetto.cpp
* Add testing for pftrace output
- relatively simple testing which verifies that the pftrace file has the same number of entries as JSON data for HIP/HSA/marker/kernel/memory_copy
* Fix import in perfetto_reader.py
* Fix data race in generatePerfetto.cpp
- questionable data race within std::regex in CI
- simplify rocprofiler::tool::format
- set config::tmp_directory to default to output_path
- fs::create_directories for tmp_file
- rework get_file_name(...) and compose_tmp_file_name(...) in tool.cpp
- enums for operations should not contain callback/buffer tracing categorization
- e.g. ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_LOAD should be ROCPROIFLER_CODE_OBJECT_LOAD
* adding pandas and pytest to rquirements.txt
* setting up requrements.txt
* Update requirements
- formatting packages
- remove packages not directly used by rocprofiler-sdk
* Update cmake formatting, linting, and options
- if BUILD_CI -> force BUILD_DEVELOPER and BUILD_WERROR
- support python installed clang-format and python installed clang-tidy
* Update build.sh
- split into install-deps.sh and install-apt-deps.sh
* Improve code coverage
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Add ToolsApiTable
Add ToolsApiTable wrapping for
scratch memory tracking
* Add initial support for scratch memory tracking
Buffering is implemented
* cmake formatting (cmake-format) (#525)
Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>
* source formatting (clang-format v11) (#524)
Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>
* Add callback tracing for scratch
Fixed the error where scratch tracking init was called irrespective of whether any client requested for it
* Apply suggestions from code review
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* Fix tools api copy/update
Table were saved/updated incorrectly in previous
commit. Also adds passing user data through the callback
* Fix OpKind sequence for scratch tracking
Previously scratch was using OpKind from rocprofiler-sdk, but
templates were instantiated using API ID. These differ by 1
* Integration tests for scratch reporting
Added buffer and callback integration tests for scratch reporting
* source formatting (clang-format v11) (#550)
Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>
* cmake formatting (cmake-format) (#551)
Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>
* python formatting (black) (#549)
Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>
* CI fixes
* source formatting (clang-format v11) (#554)
Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>
* Update api
Rebase on main and updates based on PR feedback
* Update scratch reporting and address PR comments
- Added agent id to buffer records
- Updated `test_internal_correlation_ids` - Is almost identical to
one in async-copy
- Updated scratch test to check for agent id
- Updated queue id serialization in callback records (prints
handle as nested key)
- Remove `marker_api_traces` from scratch `test_internal_correlation_ids`
validation test
- Rename `amd_tools_api` to `scratch_memory`
- Added doxygen comments
- Remove scratch callback from `tool.cpp`
- Replace assert with `LOF_IF` in `scratch_memory.cpp`
* Update tools table
Changed to match up with changes to hsa tables in main branch
* Rework scratch memory structure
* Update tests
- Added suggestions from PR review, and updated tests accordingly
* Misc cleanup
* Update scratch test
As of Apr 4th, `hsa_amd_agent_set_async_scratch_limit` is disabled.
Note,
> This API: `hsa_amd_agent_set_async_scratch_limit` is currently
> disabled. We need some changes in CP firmware to be able to do this
> and these changes are not ready yet.
> With the current code, you will also not get notifications for
> alternate-scratch allocations because this feature has been disabled
> while CP firmware is making additional changes
> We are hoping to have that feature enabled by ROCm-6.3
* Minor update to lib/rocprofiler-sdk/internal_threading.*
- delay destruction of shared_ptrs of the tasks to prevent rare (but possible) data race on the destruction of the shared_ptr
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Fix agent node id + randomize offset id
- fixes the node_id value
- randomizes a constant offset for the id.handle values
- switch to using node ids in rocprofiler-sdk-tool library
- update tests related to agents
* Logical node id
- sequential node id values from 0 to (N-1) where N is the number of agents
* Moved tests/apps to tests/bin
* Renamed cmake project in tests/bin
* Update samples
- Use ROCPROFILER_DEFAULT_FAIL_REGEX
- tweaks to stdout messages
* Update tests
- Use ROCPROFILER_DEFAULT_FAIL_REGEX
* Add tests/lib
- libraries with HIP code
* Update PTL submodule
- remove atexit delete of thread_id_map
* Update cmake/rocprofiler_options.cmake
- Set ROCPROFILER_DEFAULT_FAIL_REGEX
* Update common lib: env + logging
- improved customization of logging settings
- default to disabling logging to files
- install failure handler for rocprofv3
- set_env support in environment.*
* Add lib/rocprofiler-sdk/shared_library.cpp
- shared library constructor
* Update lib/rocprofiler-sdk-tool/tool.cpp
- destructor thread safety
- convert callback_name_info and buffered_name_info to pointers
- install failure handler for logging
* Add tests/bin/hip-in-libraries
- hip-in-libraries is an exe which uses two shared libraries where each shared library contains HIP kernels
- used for testing deadlocking within __hipRegisterFatBinary
* Update bin/rocprofv3
- reorganized the env variables
- use exec to launch command
- set ROCPROFILER_LIBRARY_CTOR=1
* Add tests/rocprofv3/tracing-hip-in-libraries
- uses hip-in-libraries exe for exe which uses shared libraries to launch HIP kernels
* Update bin/rocprofv3
- fix counter collection (no exec)
* Update lib/rocprofiler-sdk-tool/tool.cpp
- replace "Kernel-Name" with "Kernel_Name"
* Update lib/rocprofiler-sdk/registration.cpp
Use RTLD_LOCAL instead of RTLD_GLOBAL for env libraries
* Update tests/rocprofv3
- replace "Kernel-Name" with "Kernel_Name"
* Update tests
- vector-ops (bin) stream syncs + runs with 4 queues per device
- improve counter-collection/input1 validation
- rocprofv3/tracing-hip-in-libraries does not do sys-trace
- improved validation script for tracing-hip-in-libraries
- updated dispatch_callback in json-tool.cpp following reworking of prototypes for counter collection
* Update samples/counter_collection
- updated dispatch_callback(s) and record_callback(s) following reworking of prototypes
* Update bin/rocprofv3
- reorganized help menu
- added options for sub-HSA tables
- added --hip-runtime-trace
- changed --hip-trace to include --hip-compiler-trace
* Update lib/rocprofiler-sdk-tool
- improved kernel filtering
- removed arch_vgpr, accum_vgpr, sgpr code (in rocprofiler-sdk)
- fixed issue with counter-collection w/o tracing
- added support for fine grained HSA API tracing
- removed directly linking to HSA-runtime
* Update lib/rocprofiler-sdk/agent.cpp
- rocp_agents != hsa_agents is non-fatal when ROCPROFILER_BUILD_CI=OFF (CMake option)
* GPR (vector and scalar) info in kernel symbol data
- rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t contains general purpose register info
* Header include order fix
- Include repo headers first
- Third party library headers next
- standard library headers last
* Update dispatch profiling public API
- introduce rocprofiler_profile_counting_dispatch_data_t
- change signature of rocprofiler_profile_counting_dispatch_callback_t and rocprofiler_profile_counting_record_callback_t
- provide rocprofiler_user_data_t pointer in dispatch callback
- provide rocprofiler_user_data_t value (from dispatch cb) in record callback
* Update tests/bin/CMakeLists.txt
- fix add_subdirectory(hip-in-libraries) order
* Update VERSION
- bump to 0.2.0 in prep for AFAR