17fdc33d055d23a941fffb0eefc19e9e56f8c281
4 Коммитов
| Автор | SHA1 | Сообщение | Дата | |
|---|---|---|---|---|
|
|
78d8f4b8ea |
SWDEV-492623: Hip Host Function to Device Symbols Mapping (#18)
* Adding changes to register and read symbols from the hip fat binary * adding json output for host_functions * added error handling * adding json tool support * Adding tests * formatting changes * Adding documentation * refactoring as per amd-staging * Adding intializers and changing macros * Fix page-migration background thread on fork (#31) * Fix page-migration background thread on fork After falling off main in the forked child, all the children try to join on on the parent's monitoring thread. This results in a deadlock. Parent is waiting for the child to exit, but the child is trying to join the parent's thread which is signaled from the parent's static destructors. Even with just one parent and child, due to copy-on-write semantics, a child signalling the background thread to join will still block (thread's updated state is not visible in the child). This fix creates background treads on fork per-child with a pthread_atfork handler, ensuring that each child has its own monitoring thread. * Formatting fixes * Detach page-migration background thread and update test timeout * Attach files with ctest * Update corr-id assert * Tweak on-fork, simplify background thread * Revert thread detach * Adding --collection-period feature in rocprofv3 to match v1/v2 parity (#9) * Adding Trace Period feature to rocprofv3 * Adding feature documentation * Update source/bin/rocprofv3.py Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Fixing format * Moving to Collection Period and changing the input params * Format Fixes * Fixing rebasing issues * Removing atomic include from the tool * Adding more options for units, optimizing the code * Fixing rocprofv3.py * Fixing time conv & adding time controlled app * Fixing format * Changing to shared memory testing methodology * use of shmem use * Fix include headers for transpose-time-controlled.cpp * Format upload-image-to-github.py * Removing shmem and using only env var to dump timestamps from the tool * Tool Fixes + Test Config * Adding Tests * Fixing Review comments * Update trace period implementation * Update trace period tests * check between start and stop timestamps * Merge Fix * Update validate.py * Improve safety of rocprofiler_stop_context after finalization * Pass context id to collection_period_cntrl by value * Adding 20 us error margin * Ensure log level for collection-period test is not more than warning --------- Co-authored-by: Ammar ELWazir <aelwazir@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - move error code check macros to implementation - fix macros which check error code - use constexpr values instead of #define * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - debugging for error that cannot be locally reproduced * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - improve error handling and logging * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - tweak to non-fatal logging messages * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - cleanup of logging messages * Update host kernel symbol register data fields * Update source/lib/rocprofiler-sdk/code_object/hip/code_object.hpp --------- Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com> Co-authored-by: Kuricheti, Mythreya <Mythreya.Kuricheti@amd.com> Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com> Co-authored-by: Ammar ELWazir <aelwazir@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
50b185b9ac |
rocprofv3: PC Sampling Support (#14)
* Adding tool pc sampling support Fixing merge issue tool support on SDKupdates link amd-comgr Sanitizer failure fix fix format Addressing review comments misc fix Adding dispatch id to the CSV output AddingCHANGELOG [ROCProfV3][PC Sampling] Initial ROCProfV3 PC sampling tests for JSON and CSV formats (#17) ROCProfV3 initial tests for JSON and CSV output. Simple kernels that simplify the verification of samples to instruction decoding has been introduced. removing option to enable pc sampling explicitly Adding documentation no pc-sampling option in tests anymore Addressing review comments Updating docs an option for choosing whether all units must be sampled try ignoring PC sampling tests (#36) * run pc-sampling tests on MI2xx runners * use v_fmac_f32 instead of s_nop 0 in tests * fixing docs |
||
|
|
576819a6ed |
SWDEV-495725: Skipping metadata init for unsupported agents (#1194)
* SWDEV-495725: Skipping metadata init for unsupported agents * Update source/lib/output/metadata.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: mclin <mathew.clin@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> |
||
|
|
5eb8c2658c |
rocprofv3: refactor and reorganize rocprofiler-sdk-tool library (#1138)
* Add rocprofv3-multi-node.md to source/lib/rocprofiler-sdk-tool
* Initial source re-organization
- create "output" static library
* Update include/rocprofiler-sdk/cxx/serialization.hpp
- add GPR count fields to kernel symbol serialization
* Add source/scripts/generate-rocpd.py
- reads one or more JSON output files from rocprofv3 and writes rocpd SQLite3 database
- Note: preliminary implementation
* More reorganization b/t lib/rocprofiler-sdk-tool and lib/output
* Updates to generate-rocpd.py
- add SQL views
- option: --absolute-timestamps -> --normalize-timestamps
- option: --generic-markers
- misc fixes with regards to getting the views working
- support marker names
* Update generate-rocpd.py
- Add --marker-mode option
* Update generate-rocpd.py
- Improve debugging of bad bulk SQLite statements
* Update rocprofv3-multi-node.md
- cleanup of proposed SQL schema
* lib/output/format_path.{hpp,cpp}
- rename format to format_path (in config.hpp and config.cpp)
- move format_path functionality to format_path.{hpp,cpp}
* Rework lib/output/tmp_file_buffer.{hpp,cpp}
* Update output_key.cpp
- support %cwd%, %launch_date%
* Rework lib/output/buffered_output.hpp
* Support csv_output_file constructed via domain_type
* Update lib/output/domain_type.{hpp,cpp}
- get_domain_trace_file_name
- get_domain_stats_file_name
* Update lib/rocprofiler-sdk-tool/tool.cpp
- tweak headers
* Update lib/output/generate*.cpp
- remove include of helpers.hpp
- CSV uses domain_type for filenames
* Update samples/counter_collection/per_dev_serialization.cpp
- make wait_on volatile
* Remove tool_table from lib/output and lib/rocprofiler-sdk-tool
- Also split various structs into their own files
- lib/output/agent_info
- lib/output/metadata
- lib/output/kernel_symbol_info
- lib/output/counter_info
- Implemented rocprofiler::tool::metadata
* Optimize rocprofiler_tool_counter_collection_record_t
- reduce the size of the struct from 24784 bytes to 8376 bytes
* Introduced output_config
- split subset of config (from tools library) into output_config to be able to configure the output generating functions separately from the tool library
- this is a significant step towards the output generating functions not relying on static global memory
* Stream chunks of data into output instead of loading all info memory
* Remove duplicate group_segment_size in rocprofiler_kernel_dispatch_info_t serialization
* Adding Q&A to rocprofv3-multi-node.md
* Remove all remaining include lib/rocprofiler-sdk-tool from lib/output
- migrated a fair amount of code from lib/rocprofiler-sdk-tool/helper.hpp to lib/output
* Update Q&A of rocprofv3-multi-node.md
* Fix minor compilation errors + minor cleanup
* Update hsa/async_copy.cpp
- when ROCPROFILER_CI_STRICT_TIMESTAMPS > 0, reduce the active_signal sync wait time
* Update profiling_time.hpp
- fix log messages for when start/end time is less/greater than enqueue/current CPU time
* Fix generate_stats for tool_counter_record_t
* Dictionary optimization for generate-rocpd.py
---------
Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>
|