# Create PSDB.yml enabling psdb for github emu staging branch
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [x] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Technical details
Moving internal repo from github to github EMU
## Added/updated tests?
_We encourage you to keep the code coverage percentage at 80% and
above._
- [ ] Yes
- [x] No, Does not apply to this PR.
## Updated CHANGELOG?
_Needed for Release updates for a ROCm release._
- [ ] Yes
- [x] No, Does not apply to this PR.
## Added/Updated documentation?
- [ ] Yes
- [x] No, Does not apply to this PR.
---------
Co-authored-by: Mallya, Ameya Keshava <AmeyaKeshava.Mallya@amd.com>
[ROCm/rocprofiler-sdk commit: 53bb4466a4]
Adds rocprofiler_load_counter_definition. This function allows a counter definition file to be supplied to rocprofiler-sdk directly. Takes in a string containing the counter definition YAML, its size (in bytes), and a flag value to state whether this is an append operation or not.
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: usrihari123 <srihari.u@amd.com>
[ROCm/rocprofiler-sdk commit: 7ddc72ad45]
* Rebased optizations for rocprofv3 tool
* Fixing merge conflicts
* Formatting
* Open from within mutex
* Small name changes
* Added operator
[ROCm/rocprofiler-sdk commit: 6ae441f785]
* Host trap PC sampling uses new record type
* removing redundant field
* formatting
* simplifying templates in the parser - no need for HostTrap boolean
* reviving some parser tests
* hw_id decoding on GFX9
* HW id parser test
* parser CID test
* Parser multigpu test
* removing rocprofiler_pc_sampling_record_t and some fields from hw_id
* simplifying parser context
* keep bench test internally
* initializing gfx9_hw_id_t differently
* anonymous struct first
* avoiding inlining initialization of struct
[ROCm/rocprofiler-sdk commit: bc52c17e64]
* Runtime initialization tracing
- calbacks and buffer entries notifying when a runtime has been initialized
* Minor cleanup to registration.cpp
* JSON tool implementation
* Increase perfetto_reader timeout
* Handle perfetto_reader timeout when attr doesn't exist
* clang-tidy fixes to memory_allocation.cpp
[ROCm/rocprofiler-sdk commit: 249c50fc40]
* Initial commit: Need to implement wrapper function to collect data and test that wrapper function is correctly replacing core HSA functions
* Attempted to implement wrapper implementation for hsa memory allocation functions. Need to modify generate record files and test if implementation is working as expected
* Debugging and implementing generateCSV function
* Memory allocation size and starting address outputted to csv and json file formats
* Formatting
* Initial setup for OTF2 and Perfetto generation
* Collecting agent id for memory_allocation and formatting
* Modified memory_allocation.cpp to set up code for AMD_EXT commands
* Support for memory_pool_allocate added
* Removed accidently added file
* Made flag optional and added more OTF2 and Perfetto code. Needs testing to ensure perfetto and OTF2 works
* Formatting
* Fixed perfetto and otf2 output
* Fixed flag issue due to incorrect buffer use
* Updated documentation
* Small cleaning and comments
* Added test for HSA memory allocation tracing
* Fixed summary test validation errors due to allocation tracing. Added type to location_base to create unique event ids for allocation due to OTF2 trace error
* Decreased lower limit of hip calls for test
* Modified summary tests to vary number of allocate requests
* Minor fixes to address comments. Still need to address OTF2 comments
* Fix docs and changed OTF2 to use enum for type specified in location_base construction
* Fixed schema error
* Added vmem command tracking. Need to add test
* Updated test to work with vmem command and updated generateCSV to output int instead of hex string.
* OTF2 enum update and mispelling fix
* CI does not support Virtual Memory API. Removed vmem test. Will add back if CI is modifed to suport vmem API
* Update CMakeLists.txt for memory allocation test
* Updated summary test
* Minor fixes to address comments
* Moved domain_type.hpp enum to before LAST
* Fixed compile errors and formatting
* Fixed stats summary domain name error
* Added rocprofv3 test
* Page migration test fix
* Undo page migration test changes. Failures do not appear to have to do with memory allocation
[ROCm/rocprofiler-sdk commit: 3bd7773cf7]
* Fix navi3 kernel tracing
- conditional aql::set_profiler_active_on_queue only when counter collection is registered
* Update changelog
* Update following name change
[ROCm/rocprofiler-sdk commit: f7c87e455d]
* Squashed commit of the following:
commit b76f2635f4b65599f03812a73d0cf410f5ada213
Author: Mythreya <mythreya.kuricheti@amd.com>
Date: Fri Apr 26 00:29:09 2024 +0000
Changed for PR feedback
commit bedb8ad566ff42fbf117b19202c26c507abcf8ac
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Apr 25 19:20:06 2024 -0500
Fix installation
commit a98f8a69459a1450a1be9c98e20b3c1e7f2568c2
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Apr 25 19:16:35 2024 -0500
Restructure the headers
commit 46489a020ffafdd5f4ce3f580469ff233ef67fe1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date: Tue Apr 23 23:31:10 2024 +0000
Update hsa include
commit 8e795282cce348fc6aa736b7857b21aeb32aa20a
Author: Mythreya <mythreya.kuricheti@amd.com>
Date: Tue Apr 23 23:02:32 2024 +0000
Report page migration events as start/end
* Updated tests accordingly
* Page migration events are reported independently
commit 8784e5ad4895a626a2a8e4ac12f8021b34172bd4
Author: Mythreya <mythreya.kuricheti@amd.com>
Date: Tue Apr 16 17:01:57 2024 +0000
Update handling of dropped page migration events
Previously, we dropped all locally buffered events when we detect that
KFD has dropped some events. This may drop too many pending events too eagerly.
When we receive an end event and cannot find the corresponding start,
we can be sure that KFD has dropped some events in the immediate past.
When this happens, we look through all locally buffered events and report
the start events that are older than 10s as partial events --- they have
no "end" information (we expect that the end events have been dropped).
We also set the polling timeout to 10s to prevent the local buffer from
getting too large with events waiting to be paired up.
Updated tests
commit 2e8e0b07eeda9b5990e1ae8d28dcd3a035ce38e1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date: Tue Apr 16 17:01:31 2024 +0000
Docs for triggers
* Fix page migration sample
* Fix hasher, kfd install
* Add hsa include
* Install KFD include dir
* Updates from code review
- single timestamp field
- node_id -> agent_id
- from_node -> from_agent
- to_node -> to_agent
* Misc revisions
* Remove page-migration install target
* Update page-migration pytest
* Tweak to serialization
* Address PR comments
* Update page-migration test
* Add cli args, update iterations
* Address PR comments
* Add abi.cpp for static_asserts
* Update page_migration gtest with only runtime tests
* Moved helpers into utils.hpp
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 363f85dc72]
* cache reference nodes
* evaluation based on dim args
* format
* add dimensions for reduce operator
* add dimensions for reduce operator
* add dimensions for reduce operator docs
* add dimensions for reduce operator.
* refactor switch cases
* Update CHANGELOG.md
* updated doc with data example
* updated doc with data example for reduce operation.
* added fallthrough in switch case sum.
* changelog.md
* format
* fix bug in constuct_test_data()
[ROCm/rocprofiler-sdk commit: 472907a576]
* add support for select function in derived counters
* formatting
* renaming select dims variable name from set to map
* format
* Update doc with select() for dimensions
* use : for defining range of values in select dims
* - update dimension for metric after select.
- make sure to raise runtime error if user provides range for a dimension.
* use map instead of unordered_map for select dim info
* new line EOF
* fix bug: select() operator.
* Update evaluate_ast.cpp
format
* added a check for dim value exceeds max.
* Update source/lib/rocprofiler-sdk/counters/evaluate_ast.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/counters/evaluate_ast.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* updated doc with data example for select operation.
* changelog.md
* Update CHANGELOG.md
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
[ROCm/rocprofiler-sdk commit: cc4811d27d]
* Add rocprofv3-multi-node.md to source/lib/rocprofiler-sdk-tool
* Initial source re-organization
- create "output" static library
* Update include/rocprofiler-sdk/cxx/serialization.hpp
- add GPR count fields to kernel symbol serialization
* Add source/scripts/generate-rocpd.py
- reads one or more JSON output files from rocprofv3 and writes rocpd SQLite3 database
- Note: preliminary implementation
* More reorganization b/t lib/rocprofiler-sdk-tool and lib/output
* Updates to generate-rocpd.py
- add SQL views
- option: --absolute-timestamps -> --normalize-timestamps
- option: --generic-markers
- misc fixes with regards to getting the views working
- support marker names
* Update generate-rocpd.py
- Add --marker-mode option
* Update generate-rocpd.py
- Improve debugging of bad bulk SQLite statements
* Update rocprofv3-multi-node.md
- cleanup of proposed SQL schema
* lib/output/format_path.{hpp,cpp}
- rename format to format_path (in config.hpp and config.cpp)
- move format_path functionality to format_path.{hpp,cpp}
* Rework lib/output/tmp_file_buffer.{hpp,cpp}
* Update output_key.cpp
- support %cwd%, %launch_date%
* Rework lib/output/buffered_output.hpp
* Support csv_output_file constructed via domain_type
* Update lib/output/domain_type.{hpp,cpp}
- get_domain_trace_file_name
- get_domain_stats_file_name
* Update lib/rocprofiler-sdk-tool/tool.cpp
- tweak headers
* Update lib/output/generate*.cpp
- remove include of helpers.hpp
- CSV uses domain_type for filenames
* Update samples/counter_collection/per_dev_serialization.cpp
- make wait_on volatile
* Remove tool_table from lib/output and lib/rocprofiler-sdk-tool
- Also split various structs into their own files
- lib/output/agent_info
- lib/output/metadata
- lib/output/kernel_symbol_info
- lib/output/counter_info
- Implemented rocprofiler::tool::metadata
* Optimize rocprofiler_tool_counter_collection_record_t
- reduce the size of the struct from 24784 bytes to 8376 bytes
* Introduced output_config
- split subset of config (from tools library) into output_config to be able to configure the output generating functions separately from the tool library
- this is a significant step towards the output generating functions not relying on static global memory
* Stream chunks of data into output instead of loading all info memory
* Remove duplicate group_segment_size in rocprofiler_kernel_dispatch_info_t serialization
* Adding Q&A to rocprofv3-multi-node.md
* Remove all remaining include lib/rocprofiler-sdk-tool from lib/output
- migrated a fair amount of code from lib/rocprofiler-sdk-tool/helper.hpp to lib/output
* Update Q&A of rocprofv3-multi-node.md
* Fix minor compilation errors + minor cleanup
* Update hsa/async_copy.cpp
- when ROCPROFILER_CI_STRICT_TIMESTAMPS > 0, reduce the active_signal sync wait time
* Update profiling_time.hpp
- fix log messages for when start/end time is less/greater than enqueue/current CPU time
* Fix generate_stats for tool_counter_record_t
* Dictionary optimization for generate-rocpd.py
---------
Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>
[ROCm/rocprofiler-sdk commit: 5eb8c2658c]
* Relax timestamp checking
- Prevent recurring CI failures that have no remedy until HSA/driver issues are resolved
* Replace "cc" abbreviation in tests with "counter-collection"
* Update CODEOWNERS to explicitly include jrmadsen for source/include
* Extra logging in rocprofiler tool library
* Tweak aborted-app test
- remove counter collection as part of the test
[ROCm/rocprofiler-sdk commit: 98858b60ec]
* include file and print formatters for OMPT support
* Apply suggestions from code review
* Remove rocprofiler_ompt_set_callbacks
* Reorder ROCPROFILER_EXTERNAL_CORRELATION_REQUEST_OPENMP
---------
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 62e0a9c1a3]
* SDK: create CMake option for strict checks on CPU vs. GPU timestamps
- Configurating CMake with `ROCPROFILER_BUILD_CI_STRICT_TIMESTAMPS=ON` will enable fatal errors if dispatch/memcpy timestamps on GPU are outside of the start/end time from the CPU
- `ROCPROFIELR_BUILD_CI_STRICT_TIMESTAMPS` defaults to the value of `ROCPROFILER_BUILD_CI`
* Formatting
* Disable async_copy frequency scaling
* Disable profiling dispatch time frequency scaling
* Support runtime configuration via env variables
- ROCPROFILER_CI_FREQ_SCALE_TIMESTAMPS env variable will enable scaling the timestamps based on the hsa timestamp period
- ROCPROFILER_CI_STRICT_TIMESTAMPS env variable will enable strict timestamp checks
- when cmake is configured with ROCPROFILER_BUILD_CI_STRICT_TIMESTAMPS=ON, this env variable defaults to true
* ROCPROFILER_BUILD_CI_STRICT_TIMESTAMPS defaults to OFF
* Update cmake-target
* Common tracing::adjust_profiling_time
---------
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
[ROCm/rocprofiler-sdk commit: ad48201912]
Migrates profiler_serializer class in QueueController to have an instance per-agent instead of one globally. Other changes in this commit are to allow for maps of the queues associated with each agent to be passed to profiler_serializer when it is turned on/off. Existing test cases cover whether or not the kernels are serialized (multistream app). New test case added to show that this serialization only occurs on a per device level with a kernel launched on one device waiting for a value to be set on the other.
[ROCm/rocprofiler-sdk commit: 4a5b1d98c2]
* Update tests/bin/transpose/transpose.cpp
- add hipMemGetInfo call to display the available vs. total memory on the GPU
* Update tests/rocprofv3/summary/validate.py
- Updated test_summary_display_data after addition of hipMemGetInfo to transpose test exe
* Tweak code coverage comment uploading
- create unique orphan branch per PR
- reduce quality of PNG files (85 -> 70)
* Revert some of code coverage comment uploading
- remove creation of unique orphan branch per PR
* Tweak code coverage comment uploading
- create unique orphan branch per PR
[ROCm/rocprofiler-sdk commit: 5e1643cf81]
* Bump rocm-docs-core to 1.8.2
* changing certi back to previous version
---------
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
[ROCm/rocprofiler-sdk commit: dd71131114]
* Check to force tool to initialize the ctx id to zero.
* initialize rocprofiler_context_id_t with 0 in units tests
* changelog
---------
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
[ROCm/rocprofiler-sdk commit: 3f91d90bbc]