007285272be3fe54ba7674f61d695d6f2f84adba
42 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
31fe8858d1 |
rocJPEG API Tracing (#73)
* rocDecode API Tracing support * Test bin file added to rocdecode. Need to add validate python methods * Added option to not make rocDecode tests * Added rocdecode and rocprofv3 tests * Added csv test * Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI * Add option to avoid building rocdecode tests * Added option to avoid building rocdecode bin file * Support for rocJPEG API Trace * Added newline to rocjpeg_version.h * json-tool code added, initial test/bin commit * Formatting * Resolved rocjpeg bin test compilation errors * Tests implemented. Perfetto module currently resulting in errors, so need to retest whenever it is fixed * Formatting and compilation errors * Minor fixes * Copyright year update and minor fixes * Doc update fix * Added rocjpeg csv file in data * Addresses review comments: Updated fixed Findroc.. and uses root directory as a hint, fixed documentation error, changed tables to use _CORE, minor style fixes * Added rocdecode and rocjpeg to CI * Removed rocdecode and rocjpeg from CI and added back build tests option * Updated Cmake Files * Added rocDecode and rocJPEG to CI * Remove cmake line added in error * Temporarily modified tests to pass if rocdecode or rocjpeg tracing are not supported for CI, cmake changes * Added find_package for test * Added back use of system rocDecode and rocJPEG, modifies system files to include prefix path * Updated no-link to include INCLUDE_DIR/roc(decode|jpeg), added comments for tests * Resolve merge conflicts and formatting * Added regex find and replace instead of include for CI * VAAPI package causing errors on Vega20 * Removed system rocjpeg and rocdecode use temporarily until cmake issues resolved * Removed workflows regex * Formatting and minor test modification * Modified test for vega20 * Update rocDecode and rocJPEG cmake and tests * Changelog * Fix merge conflict * Added back if-statements around add-tests since cmake-generator-expressions are resulting in errors when the packages are missing * Removed if found statements, replaced with TARGET:EXISTS * Skip json file for rocjpeg and rocdecode tests if not supported * Add os import --------- Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
470f347e50 |
SDK: remove majority of exceptions (#176)
* SDK: remove majority of exceptions - replace with ROCP_FATAL, ROCP_CI_LOG(WARNING), etc. - improve logging of symbolic link - add --readlink and --realpath (hidden options) to rocprofv3 to follow symlinks for preloaded libraries * Add rocprofv3 --rocm-root argument * Fix registration resolved_exists * Fix rocprofv3_avail.py * Update logging for rocprofiler_configure search - relax failure conditions * Misc clang-tidy fixes * Fix merge * Fix merge --------- Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com> |
||
|
|
0fbe6cc7b6 |
SDK: No bg thread if no clients use SDK (#123)
* SDK: No bg thread if no clients use SDK * Update CHANGELOG --------- Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
97b7a6315d |
update copyright date to 2025 (#102)
* Update LICENSE * Update conf.py * Update copyright year * [fix] Update copyright year * Update copyright year "ROCm Developer Tools" * Add license headers to c++ files * Add license to *.py * Update licenses in rocdecode sources --------- Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com> Co-authored-by: Mythreya <mythreya.kuricheti@amd.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
e307b89ca4 |
rocDecode API Tracing Support (#49)
* rocDecode API Tracing support * Test bin file added to rocdecode. Need to add validate python methods * Added option to not make rocDecode tests * Added rocdecode and rocprofv3 tests * Added csv test * Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI * Add option to avoid building rocdecode tests * Added option to avoid building rocdecode bin file * Merge conflict error * CMake files changed in response to review comments. Attempting to implement callbacks. * Turned off test building for rocdecode * Minor fixes for review comments * Review comments * Updated formatting * Document changes and format.hpp reversion. Need to remove iterate args support for now for later update. * Remove iterate args support * Remove iterate-args * enforce abi versioning in macro if * Fix doc error * removed spaces to fix indentation error --------- Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com> |
||
|
|
78d8f4b8ea |
SWDEV-492623: Hip Host Function to Device Symbols Mapping (#18)
* Adding changes to register and read symbols from the hip fat binary * adding json output for host_functions * added error handling * adding json tool support * Adding tests * formatting changes * Adding documentation * refactoring as per amd-staging * Adding intializers and changing macros * Fix page-migration background thread on fork (#31) * Fix page-migration background thread on fork After falling off main in the forked child, all the children try to join on on the parent's monitoring thread. This results in a deadlock. Parent is waiting for the child to exit, but the child is trying to join the parent's thread which is signaled from the parent's static destructors. Even with just one parent and child, due to copy-on-write semantics, a child signalling the background thread to join will still block (thread's updated state is not visible in the child). This fix creates background treads on fork per-child with a pthread_atfork handler, ensuring that each child has its own monitoring thread. * Formatting fixes * Detach page-migration background thread and update test timeout * Attach files with ctest * Update corr-id assert * Tweak on-fork, simplify background thread * Revert thread detach * Adding --collection-period feature in rocprofv3 to match v1/v2 parity (#9) * Adding Trace Period feature to rocprofv3 * Adding feature documentation * Update source/bin/rocprofv3.py Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Fixing format * Moving to Collection Period and changing the input params * Format Fixes * Fixing rebasing issues * Removing atomic include from the tool * Adding more options for units, optimizing the code * Fixing rocprofv3.py * Fixing time conv & adding time controlled app * Fixing format * Changing to shared memory testing methodology * use of shmem use * Fix include headers for transpose-time-controlled.cpp * Format upload-image-to-github.py * Removing shmem and using only env var to dump timestamps from the tool * Tool Fixes + Test Config * Adding Tests * Fixing Review comments * Update trace period implementation * Update trace period tests * check between start and stop timestamps * Merge Fix * Update validate.py * Improve safety of rocprofiler_stop_context after finalization * Pass context id to collection_period_cntrl by value * Adding 20 us error margin * Ensure log level for collection-period test is not more than warning --------- Co-authored-by: Ammar ELWazir <aelwazir@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - move error code check macros to implementation - fix macros which check error code - use constexpr values instead of #define * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - debugging for error that cannot be locally reproduced * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - improve error handling and logging * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - tweak to non-fatal logging messages * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - cleanup of logging messages * Update host kernel symbol register data fields * Update source/lib/rocprofiler-sdk/code_object/hip/code_object.hpp --------- Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com> Co-authored-by: Kuricheti, Mythreya <Mythreya.Kuricheti@amd.com> Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com> Co-authored-by: Ammar ELWazir <aelwazir@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
00c46fd5e5 |
SDK: OMPT Support (#22)
* Ability to select alternative compiler per file
Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.
Misc updates
Update OpenMP target sample
- samples/ompt -> samples/openmp_target
- fix sample test of openmp-target
- reorganize files
Rework OpenMP implementation
Minor OpenMP implementation cleanup
Rename samples/openmp_target CMake targets
Add tests/bin/openmp
- OpenMP target test app in tests/bin/openmp/target
Format samples/openmp_target CMakeLists.txt
Misc lib/rocprofiler-sdk/openmp cleanup
- fix includes
- convert_arg
Update openmp.def.cpp
- tweak includes
- remove lots of temporary variables
Update samples
- common::get_callback_id_names() -> common::get_callback_tracing_names()
- add kernel dispatch, memory copy, scratch memory buffered tracing to openmp target sample
Fix code object operation names
- add "CODE_OBJECT_" prefix
Update include/rocprofiler-sdk/openmp/api_id.h
- remove spurious comment
Miscellaneous openmp updates
- similar API for openmp_begin and openmp_end
- move implementations of ompt callbacks to openmp.cpp
- ompt_{thread_begin,thread_end,parallel_begin,parallel_end}_callbacks are openmp_events
[SWDEV-484495] Fix int truncation in CSV output (#1098)
CSV output truncates doubles to ints when it shouldn't. Derived metrics
are (mostly) doubles and lose precision (or become worthless) if treated
as an int. Converted these to double to match the format we return from
rocprof-sdk.
Co-authored-by: Benjamin Welton <ben@amd.com>
Update limit for max counter records in rocprof-tool (#1073)
A fixed sized std::array is used to store counter records in rocprofiler SDK. This limit was breached in SWDEV-484742. Upping the limit to 512 to be less likely to reach this limit again.
adding proxy ompt_data_t * arguments
fixes for proxy pointers
- Implement proxy ompt_data_t* pointers for clients
- Add ompt_data_t* arguments back to callback API
- Modify openmp sample to illustrate use of proxy pointers
formatting
SWDEV-467350: Skipping tool counter iteration for unsupported hardware (#1083)
Fixing some accumulate metrics (#1089)
* Fixing some accumulate metrics
* Fixing some more accumulate metrics
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
updating rocprofv3 help options (#1113)
* updating rocprofv3 help options
* updating CHANGELOG
Fixing installed pacakge tests in CI (#1119)
* Fixing installed pacakge tests in CI
* Formatted rocprofv3.py with black formatter
SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests. (#1112)
* SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests.
* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
* Adding backlog for codeobj changes
* Formatting
* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
---------
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
SWDEV-487621: Fixes for metric definitions (#1118)
* Fixes for metric definitions
* Removing gfx8
* Update changelog
* Fixing unit tests
* Small fixes
* Fix for write size
Fix PSDB change (#1120)
Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit
|
||
|
|
249c50fc40 |
Runtime Initialization Tracing (#1105)
* Runtime initialization tracing - calbacks and buffer entries notifying when a runtime has been initialized * Minor cleanup to registration.cpp * JSON tool implementation * Increase perfetto_reader timeout * Handle perfetto_reader timeout when attr doesn't exist * clang-tidy fixes to memory_allocation.cpp |
||
|
|
3bd7773cf7 |
Memory Allocation Tracking (#1142)
* Initial commit: Need to implement wrapper function to collect data and test that wrapper function is correctly replacing core HSA functions * Attempted to implement wrapper implementation for hsa memory allocation functions. Need to modify generate record files and test if implementation is working as expected * Debugging and implementing generateCSV function * Memory allocation size and starting address outputted to csv and json file formats * Formatting * Initial setup for OTF2 and Perfetto generation * Collecting agent id for memory_allocation and formatting * Modified memory_allocation.cpp to set up code for AMD_EXT commands * Support for memory_pool_allocate added * Removed accidently added file * Made flag optional and added more OTF2 and Perfetto code. Needs testing to ensure perfetto and OTF2 works * Formatting * Fixed perfetto and otf2 output * Fixed flag issue due to incorrect buffer use * Updated documentation * Small cleaning and comments * Added test for HSA memory allocation tracing * Fixed summary test validation errors due to allocation tracing. Added type to location_base to create unique event ids for allocation due to OTF2 trace error * Decreased lower limit of hip calls for test * Modified summary tests to vary number of allocate requests * Minor fixes to address comments. Still need to address OTF2 comments * Fix docs and changed OTF2 to use enum for type specified in location_base construction * Fixed schema error * Added vmem command tracking. Need to add test * Updated test to work with vmem command and updated generateCSV to output int instead of hex string. * OTF2 enum update and mispelling fix * CI does not support Virtual Memory API. Removed vmem test. Will add back if CI is modifed to suport vmem API * Update CMakeLists.txt for memory allocation test * Updated summary test * Minor fixes to address comments * Moved domain_type.hpp enum to before LAST * Fixed compile errors and formatting * Fixed stats summary domain name error * Added rocprofv3 test * Page migration test fix * Undo page migration test changes. Failures do not appear to have to do with memory allocation |
||
|
|
bb69467765 |
Renamed agent profiling service to device counting service (#1132)
* Renamed agent profiling service to device counting service
Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:
find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +
* Converted dispatch profile to dispatch counting service
* Debug for functioal counters test
* Minor changes for CI
* Minor fix
* More fixes for CI
* Update evaluate_ast.cpp
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
|
||
|
|
320427b5f5 |
rocprofv3: docs and help menu updates (#1129)
* doc updates * Correcting ROCtx information * Making ROCTx string consistent * missing occurence |
||
|
|
2a146259c7 |
Add support for RCCL tracing (#1047)
* [Draft]: Add support for RCCL tracing Address comments * [Draft]: Add support for RCCL tracing Address PR comments, changes from RCCL upstream * Add RCCL library table registration Working on adding support to rocprofiler-register * Support compilation w/o <rccl/amd_detail/api_trace.h> - dummy api_trace.h header - return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED when RCCL does not have api_trace.h header * RCCL API tracing tool support - add to rocprofv3 - add to json-tool --------- Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
5d54682468 |
Misc cleanup and stale code removal (#1026)
* Remove custom allocators - remove unused lib/rocprofiler-sdk/allocator.* - remove unused lib/rocprofiler-sdk/context/allocator.hpp * Fix rocprofiler_strip_target (rocprofiler_utilities.cmake) * Remove old HSA_TOOLS_LIB support - remove OnLoad/OnUnload functions used by HSA_TOOLS_LIB env variable * Fix linter warnings + specific NOLINT exceptions - replace bare NOLINT with NOLINT(<warning-name>) |
||
|
|
fa1b9e67ab |
ATT Agent fixes and improvements (#1011)
* Tidying ATT dispatch API. ATT Agent to be initialized with rest of profiler. Removing read_index-based wait. * Formatting * Adding some input validation * Add perf test for agent * Removing async |
||
|
|
dc671497da |
look for symbols in dynsym table (#990)
* look for symbols in dynsym table * checking both symtab and dynsym * Avoid symbol duplication in non stripped binaries * clang-format * Minor elf_utils.cpp updates - use 'else if' instead of 'if' - logging tweaks * Update registration - tweak logging * Update testing - strip the rocprofiler-sdk-c-tool library - add test-c-tool-rocp-tool-lib-execute test which does NOT LD_PRELOAD the library (uses only ROCP_TOOL_LIBRARIES instead) --------- Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
2be3543c7b |
Parse ELF format for rocprofiler_configure symbol (#970)
* Parse ELF format to search for rocprofiler_configure * Use ELF parsing in registration |
||
|
|
8da0c35079 |
Adding wrappers on HSA for executable load/unload and allowing multiple agents per context on ATT (#951)
* Codeobj wrappers around HSA calls for ATT * Formatting * Bookeeping * Tidy * Tidy * Update source/lib/rocprofiler-sdk/thread_trace/code_object.hpp Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com> * Update source/lib/rocprofiler-sdk/thread_trace/att_core.hpp Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com> * Variable naming --------- Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com> |
||
|
|
62ec95eae6 | Sync queue and async copy on client finalizer (#950) | ||
|
|
987ae3cc47 |
PC Sampling Support (#715)
* cmake formatting (cmake-format) (#188) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * source formatting (clang-format v11) (#189) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: design of the pc sampling data struct; guarding parts of code that uses ROCr marker packets * source formatting (clang-format v11) (#191) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * cmake formatting (cmake-format) (#192) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: shadow variable fix * pcs: fix for compiler errors reported by CI/CD * source formatting (clang-format v11) (#193) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: docs fix; samples uses rocprofiler::rocprofiler library * cmake formatting (cmake-format) (#195) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: client in samples folder fixed * pcs: client requires rocprofiler package as dependency * pcs: client uses single context * source formatting (clang-format v11) (#196) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: client using single buffer; no buffer destroy in client * pcs: client::setup explicitly called from the example * pcs: rocprofiler_pc_sample_record_t updated * pcs: fixed init of external correlation id * source formatting (clang-format v11) (#198) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: remove outdated files; update CMakeLists * cmake formatting (cmake-format) (#212) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: using rocprofiler_agent_id_t * pcs: Removing trailing whitespaces Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> * source formatting (clang-format v11) (#214) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: mapping agent_id to the agent * source formatting (clang-format v11) (#215) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: const while iterating over agents * source formatting (clang-format v11) (#216) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: calling get_buffer instead of get_buffers * pcs: workgroup typo * pcs: documentation for the public PC sampling API * pcs: queue_cb_t signature adaptation * pcs: mocks removed * pcs: updating HsaApiTable with HSA/ROCr PC sampling API * pcs: querying available PC sampling configs through IOCTL * pcs: create the PCS session in IOCTL * pcs: first actual PC samples delivered to the rocprofiler's client :) * pcs: works with marker packet too * pcs: using HSA table to call pc sampling related functions * pcs: using ioctl instead of kfd in naming * pcs: configuration service test fixed * pcs: sample processing test fixed * pcs: marker packet macro wrapper removed * pcs: marker packet is part of the rocprofiler_packet union * pcs: one fixme added * pcs: client that uses pc-sampling and code obj tracing * pcs: client that supprts PC sampling and code obj tracing refactored * pcs: show more info for each PC sample * pcs: hex output for the samples that do not belong to the matmul kernel * pcs: querying avail configuration happens immediately before configuring * pcs: hsa_ven_amd_pcs_create_from_id renamed * pcs: using hsa_stop; accessing a buffer by id from parser * pcs: includes reworked, tests returned to life * pcs: rocrofiler dir removed as outdated * cmake formatting (cmake-format) (#271) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * source formatting (clang-format v11) (#272) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: some warnings fixed * source formatting (clang-format v11) (#273) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * cmake formatting (cmake-format) (#274) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: show MI200 relevant information in the sample * pcs: queue cb fixed; rocr.h include fixed * source formatting (clang-format v11) (#296) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: getting hsa_agent and the doorbell_id from hsa_queue * source formatting (clang-format v11) (#297) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: correlation ID logic fixed * source formatting (clang-format v11) (#303) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: pure pc sampling example fixed * source formatting (clang-format v11) (#307) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * cmake formatting (cmake-format) (#308) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: interval value if the PC sampling is already configured * pcs: ROCPROFILER_STATUS_ERROR_PC_SAMPLING_ALREADY_CONFIGURED New status code if another process configured PC sampling service with different configuration. Samples are extended to consider this case and retry if it happens. * pcs: hsa_amd_queue_get_info mocked in tests * source formatting (clang-format v11) (#328) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs (tests): query configs after configuring service * source formatting (clang-format v11) (#329) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: sample checks workgroup_id_* and wave_id * source formatting (clang-format v11) (#330) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs samples: running samples on the device 0 * pcs: kfd_ioctl updated * pcs: ioctl config struct changed fields names * pcs: status when PC sampling is configured by another process is renamed * pcs: HSA PC sampling API table fixed * pcs: tmp hack to be able to use HSA pc sampling table * source formatting (clang-format v11) (#443) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs service use CIDs generated by HIP API tracing service * source formatting (clang-format v11) (#455) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * cmake formatting (cmake-format) (#456) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: CID manager * pcs: explicit flush with no delivered data executes retirement logic * source formatting (clang-format v11) (#464) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: rocprofiler_query_pc_sampling_agent_configurations docs update * source formatting (clang-format v11) (#465) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: rocprofiler_configure_pc_sampling_service docs update * pcs: explicit sync introduced in PCSCIDManager * pcs: new logic for retiring CIDs in PC sampling service documented * pcs: queue interception cb signature updated * source formatting (clang-format v11) (#471) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: if no agents supports PC sampling, fail gracefully * elaborating when KFD returns EBUSY and EEXIST * pcs: the second PC sampling examples fails gracefully * code samples use only single kernel for now * pcs: CID manager refactored * source formatting (clang-format v11) (#481) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: ioctl update * source formatting (clang-format v11) (#531) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs:code sample to test PC sampling applied on concurrent kernels * source formatting (clang-format v11) (#533) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: pc sampling strest test included * cmake formatting (cmake-format) (#539) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * source formatting (clang-format v11) (#540) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: standalone benchmark * cmake formatting (cmake-format) (#555) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: glance in external correlation IDs * source formatting (clang-format v11) (#557) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * another change in ioctl interface * pcs: update queue interceptor callbacks and samples accroding to the agent 0 version * source formatting (clang-format v11) (#611) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: avoid running problematic PC sampling test * pcs: guarding tests not to fail on architectures not supporting PC sampling * source formatting (clang-format v11) (#617) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: check IOCTL version prior to each KFD call * pcs: ioctl refactoring * pcs: PC sampling service increases the ref_count of the correlation ID of the kernel dispatch * cmake formatting (cmake-format) (#631) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * source formatting (clang-format v11) (#632) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: PC sampling service provides external correlation IDs * source formatting (clang-format v11) (#644) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: use rocprofiler_dim3_t for workgrou_ip * source formatting (clang-format v11) (#645) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: minor fixes * pcs: updating the documentation for the pc sampling API functions * pcs: api table and queue controller fix * pcs: don't generate marker packets for the agent if PC sampling is not configured on it * pcs: multi-GPU and single-GPU clients * source formatting (clang-format v11) (#700) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: warning and errors fixed * source formatting (clang-format v11) (#702) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: clang compiler errors and warnings fixed * source formatting (clang-format v11) (#716) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: const reference in cid manager * source formatting (clang-format v11) (#717) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: const & func in manager explicit * pcs: test to cover creating PC sampling service of agent that does not exist * pcs: generate marker packets if service is active * source formatting (clang-format v11) (#719) Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com> * pcs: refactoring hsa_adapter; use the correlation_id->thread_idx * Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp * Update source/lib/rocprofiler-sdk/pc_sampling/utils.cpp * Update utils.cpp * moving pc-sampling tests and samples to pc-sampling label * Format fix * pcs: use configured instead of active service * Update source/lib/rocprofiler-sdk/pc_sampling/service.cpp * pcs: ensure configuring PC sampling on the HSA level is called only once * pcs: minor fix * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * pcs: refactoring IOCTL integration * Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * pcs: reverting back what bot doubled * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * pcs: retesting the bot * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * pcs: why bot fails on this IOCTL status * pcs: why failing on <vector> * Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * pcs: returning commits removed by bot * pcs: formatting locally * pcs: clients are flushing buffers inside the tool_fini * pcs: sync function in public API * pcs: sync prior to unloading the code object * pcs: sync function requires context * pcs: client uses CID retirement service * pcs: test for flusing internal ROCr buffers * pcs: source formatting * Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * pcs: code samples refactoring * pcs: public API header refactored * pcs: rocprofiler_buffer_flush drains internal PC sampling buffers too * pcs: remove unnecessary functions * pcs: do not call hsa's copytables * pcs: include reordering * pcs: using ROCP_ERROR inside PC sampling implementation * pcs: pc_sampling sample uses ostream instean of printfs * pcs: pc_sampling_codeobj tracing using ostream instead of prints * pcs: registering once for interceptor callbacks * pcs: do not generate internal CIDs if not in debug mode * pcs: rebasing fixed; missing external correlation IDs * pcs: code formatting * enable kernel tracing service to receive external correlation IDs * pcs: using ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL * pcs: polishing parser * formatting * updating parser to use workgroup_id * kfd_ioctl.h extracted in details folder * refactoring * pcs: preparing to generate code object information * flush internal buffers prior to unloading code object * pcs: generating marker records * pcs: wrap code_object's shutdown function * ROCR_VISIBLE_DEVICES and HIP_VISISBLE_DEVICES unsupported at the moment * documenting the ignorance of ROCR/HIP_VISIBLE_DEVICES * pcs: separate structs for code object loading/unloading markers * pcs: inst_pkt_t changed the namespace * pcs: removing wrapper around the shutdown function * pcs: size in record field * pcs: documentation refactoring + typdefs * renaming PCSAgentConfig to PCSAgentSession * pcs: service does not keep a pointer to the context * pcs: static assertions related to the versioning * pcs: rocprofiler_pc_sampling_configuration_t size field * pcs: report API unimplemented unleass explicitly enabled * pcs: skip tests if KFD does not support PC sampling * pcs: if ROCr hides some devices, no PC samples will be delivered for it * pcs: hip error check after kernel launch * formatting * removing PCS info from agent.h * fix based on review * Update continuous integration workflow - use mi200 runner for code coverage (supports PC sampling) - split sanitizer jobs across navi3, vega20, and mi300 * Updating pc sampling test labels * ROCP_PC_SAMPLING_ENABLED env in CI * ROCP_PC_SAMPLING_ENABLED for all CI mi200 jobs * Rearrange sanitizer assignments * fixes according to review * removed unused functions * pcs: rocprofiler_agent_id_t instead of handle as a key in map * Update source/lib/rocprofiler-sdk/context/context.hpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * removing drm_fd from the agent.h * pcs: removing one sample due to complexity * pcs: refactoring sample * simplifying sample * new lines * Improve queue_control enable intercepter logic * Update lib/rocprofiler-sdk/hsa/types.hpp - handle amd_ext size for HSA 1.12.0 * ROCP_PC_SAMPLING_ENABLED -> ROCPROFILER_PC_SAMPLING_BETA_ENABLED * Update hsa_adapter.cpp - anonymous namespace + remove debug * parser update * Apply suggestions from code review --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> Co-authored-by: vlaindic <vladimir.indic@amd.com> Co-authored-by: vlaindic <vlaindic@amd.com> Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
28e6430d04 |
[2/N] Agent Counter implementation with unit tests to check functionality (#846)
Agent Counter Collection API with tests and samples. --------- Co-authored-by: Benjamin Welton <ben@amd.com> |
||
|
|
4d5b71b0e7 |
Update logging (#838)
* Update logging * Remove unused function * Fix lib/rocprofiler-sdk/hsa/pc_sampling.cpp logging compilation * Fix logging FLAGS_vmodule string leak and numerical log level * Update logging * Update glog submodule * Leak fixes * format |
||
|
|
733aa8e438 |
Restructure code object source code (#826)
* public codeobj info * Restructure code object source code file layout * Update get_unloaded_code_objects + add iterate_loaded_code_objects * Remove get_unloaded_code_objects from visible internal API - iterate_loaded_code_objects + functor which filters on the hsa_executable_t effectively reproduces this behavior * Whitespace removal --------- Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
deabd869b5 |
Introducing PcSamplingExtTable (#735)
* pcs: updating the PCS table * Fixing Clang Tidy errors * pcs: reverting old table version * testint wrong table size * new size * testing step * reverting old steps * hsa_amd_queue_get_info introduced * pcs: testing table version * formatting * removing redundand declarations * removing unnecessary files * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update source/lib/rocprofiler-sdk/hsa/pc_sampling.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Enable function pointer offset check in hsa::pc_sampling::copy_table - add offset() to HSA_API_META_DEFINITION - check if offset() >= size of struct * Support build without PC sampling API table * ids for ROCr's PC sampling public functions --------- Co-authored-by: Ammar ELWazir <aelwazir@hpe6u-21.amd.com> Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
fd3d97287c |
Page migration reporting (#651)
* Page migration reporting support * Page migration: Update parser and reporting Container does not lave latest KFD header, so CI might fail * Add kfd_ioctl.h * Formatting * Update get_key - get key was not used (and shouldn't be), so delete it * clang-tidy fixes * Tests for page migration * Apply suggestions from code review Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update tests/bin/page-migration/CMakeLists.txt Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update page-migration test app - add hipHostRegister to register mmap'ed allocation with HIP - misc cleanup and reorg - remove HSA_XNACK=1 from test env * Update lib/rocprofiler-sdk/tests/page_migration.cpp - fix compilation error * Minor updates (reorg, rename) * Page migration reporting support * Page migration: Update parser and reporting Container does not lave latest KFD header, so CI might fail * Update page migration tests, fix trigger types * Page Migration Tracing Support Refactoring (#753) * Reorganization * Update page migration init/fini * Formatting * Update page_migration.cpp - change logging severity * Skip test if KFD does not support page migration reporting * Rework skipping test if KFD does not support page migration * Fix event trigger enum values * Fix clang-diagnostic-unused-const-variable --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> |
||
|
|
e2d8ccad4b |
adding pandas and pytest to rquirements.txt (#748)
* adding pandas and pytest to rquirements.txt * setting up requrements.txt * Update requirements - formatting packages - remove packages not directly used by rocprofiler-sdk * Update cmake formatting, linting, and options - if BUILD_CI -> force BUILD_DEVELOPER and BUILD_WERROR - support python installed clang-format and python installed clang-tidy * Update build.sh - split into install-deps.sh and install-apt-deps.sh * Improve code coverage --------- Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
4fa165ec1a |
Add support for scratch reporting (#523)
* Add ToolsApiTable Add ToolsApiTable wrapping for scratch memory tracking * Add initial support for scratch memory tracking Buffering is implemented * cmake formatting (cmake-format) (#525) Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com> * source formatting (clang-format v11) (#524) Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com> * Add callback tracing for scratch Fixed the error where scratch tracking init was called irrespective of whether any client requested for it * Apply suggestions from code review Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> * Fix tools api copy/update Table were saved/updated incorrectly in previous commit. Also adds passing user data through the callback * Fix OpKind sequence for scratch tracking Previously scratch was using OpKind from rocprofiler-sdk, but templates were instantiated using API ID. These differ by 1 * Integration tests for scratch reporting Added buffer and callback integration tests for scratch reporting * source formatting (clang-format v11) (#550) Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com> * cmake formatting (cmake-format) (#551) Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com> * python formatting (black) (#549) Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com> * CI fixes * source formatting (clang-format v11) (#554) Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com> * Update api Rebase on main and updates based on PR feedback * Update scratch reporting and address PR comments - Added agent id to buffer records - Updated `test_internal_correlation_ids` - Is almost identical to one in async-copy - Updated scratch test to check for agent id - Updated queue id serialization in callback records (prints handle as nested key) - Remove `marker_api_traces` from scratch `test_internal_correlation_ids` validation test - Rename `amd_tools_api` to `scratch_memory` - Added doxygen comments - Remove scratch callback from `tool.cpp` - Replace assert with `LOF_IF` in `scratch_memory.cpp` * Update tools table Changed to match up with changes to hsa tables in main branch * Rework scratch memory structure * Update tests - Added suggestions from PR review, and updated tests accordingly * Misc cleanup * Update scratch test As of Apr 4th, `hsa_amd_agent_set_async_scratch_limit` is disabled. Note, > This API: `hsa_amd_agent_set_async_scratch_limit` is currently > disabled. We need some changes in CP firmware to be able to do this > and these changes are not ready yet. > With the current code, you will also not get notifications for > alternate-scratch allocations because this feature has been disabled > while CP firmware is making additional changes > We are hoping to have that feature enabled by ROCm-6.3 * Minor update to lib/rocprofiler-sdk/internal_threading.* - delay destruction of shared_ptrs of the tasks to prevent rare (but possible) data race on the destruction of the shared_ptr --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
41c0ddd72d |
Convert LOG() -> ROCP_X logging macros. (#695)
* Convert LOG() -> ROCP_X logging macros. This patch converts the LOG() macro to the ROCP_X logging macros. There are the following levels of logs. Logs whos expressions are not evaluated unless the log level is enabled: ROCP_TRACE - VLOG(2) (enabeled by env variable GLOG_v=2) ROCP_INFO - VLOG(1) (enabeled by env variable GLOG_v=1) Logs whos expressions are always evaluated: ROCP_WARNING - LOG(WARNING) ROCP_ERROR - LOG(ERROR) ROCP_FATAL - LOG(FATAL) ROCP_DFATAL - DLOG(FATAL) (only fatal in debug mode) * source formatting (clang-format v11) (#696) Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com> * Minor fix * Fixes for VLOG before main * fix vmodule * source formatting (clang-format v11) (#718) Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com> * memory leak fix * Vlog change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com> |
||
|
|
939e23e9d1 |
Stop all client contexts prior to finalization (#721)
* Stop all client contexts prior to finalization
* Update lib/common/container/static_vector.hpp
- improve emplace_back for non-{move,copy}-assignable object
* Update samples/intercept_table/client.cpp
- improve robustness against static object destruction
* Update lib/rocprofiler-sdk/context/context.cpp
- change storage of registered context array
- stable_vector of optional contexts
- common::static_object wrapper around stable_vector
* Update samples/intercept_table/client.cpp
- use variable template for underlying function pointer
|
||
|
|
bc9f86ec62 |
Update HSA copy table (#687)
- two copies of HSA table: internal and tracing - internal is used to invoke HSA function without any possibility of triggering tracing, etc. |
||
|
|
7b6d3c70bd |
Shared Library Constructor (rocprofv3 deadlock fix) (#599)
* Moved tests/apps to tests/bin * Renamed cmake project in tests/bin * Update samples - Use ROCPROFILER_DEFAULT_FAIL_REGEX - tweaks to stdout messages * Update tests - Use ROCPROFILER_DEFAULT_FAIL_REGEX * Add tests/lib - libraries with HIP code * Update PTL submodule - remove atexit delete of thread_id_map * Update cmake/rocprofiler_options.cmake - Set ROCPROFILER_DEFAULT_FAIL_REGEX * Update common lib: env + logging - improved customization of logging settings - default to disabling logging to files - install failure handler for rocprofv3 - set_env support in environment.* * Add lib/rocprofiler-sdk/shared_library.cpp - shared library constructor * Update lib/rocprofiler-sdk-tool/tool.cpp - destructor thread safety - convert callback_name_info and buffered_name_info to pointers - install failure handler for logging * Add tests/bin/hip-in-libraries - hip-in-libraries is an exe which uses two shared libraries where each shared library contains HIP kernels - used for testing deadlocking within __hipRegisterFatBinary * Update bin/rocprofv3 - reorganized the env variables - use exec to launch command - set ROCPROFILER_LIBRARY_CTOR=1 * Add tests/rocprofv3/tracing-hip-in-libraries - uses hip-in-libraries exe for exe which uses shared libraries to launch HIP kernels * Update bin/rocprofv3 - fix counter collection (no exec) * Update lib/rocprofiler-sdk-tool/tool.cpp - replace "Kernel-Name" with "Kernel_Name" * Update lib/rocprofiler-sdk/registration.cpp Use RTLD_LOCAL instead of RTLD_GLOBAL for env libraries * Update tests/rocprofv3 - replace "Kernel-Name" with "Kernel_Name" * Update tests - vector-ops (bin) stream syncs + runs with 4 queues per device - improve counter-collection/input1 validation - rocprofv3/tracing-hip-in-libraries does not do sys-trace - improved validation script for tracing-hip-in-libraries - updated dispatch_callback in json-tool.cpp following reworking of prototypes for counter collection * Update samples/counter_collection - updated dispatch_callback(s) and record_callback(s) following reworking of prototypes * Update bin/rocprofv3 - reorganized help menu - added options for sub-HSA tables - added --hip-runtime-trace - changed --hip-trace to include --hip-compiler-trace * Update lib/rocprofiler-sdk-tool - improved kernel filtering - removed arch_vgpr, accum_vgpr, sgpr code (in rocprofiler-sdk) - fixed issue with counter-collection w/o tracing - added support for fine grained HSA API tracing - removed directly linking to HSA-runtime * Update lib/rocprofiler-sdk/agent.cpp - rocp_agents != hsa_agents is non-fatal when ROCPROFILER_BUILD_CI=OFF (CMake option) * GPR (vector and scalar) info in kernel symbol data - rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t contains general purpose register info * Header include order fix - Include repo headers first - Third party library headers next - standard library headers last * Update dispatch profiling public API - introduce rocprofiler_profile_counting_dispatch_data_t - change signature of rocprofiler_profile_counting_dispatch_callback_t and rocprofiler_profile_counting_record_callback_t - provide rocprofiler_user_data_t pointer in dispatch callback - provide rocprofiler_user_data_t value (from dispatch cb) in record callback * Update tests/bin/CMakeLists.txt - fix add_subdirectory(hip-in-libraries) order * Update VERSION - bump to 0.2.0 in prep for AFAR |
||
|
|
b0a88d9124 |
Update registration client search (#569)
* Update registration client search - Search ROCP_TOOL_LIBRARIES before dlopen search - Fatal error if ROCP_TOOL_LIBRARIES entry does not contain rocprofiler_configure symbol - Use RTLD_DEFAULT and RTLD_NEXT to (potentially) find first two instances of rocprofiler_configure - if no rocprofiler_configure found via RTLD_NEXT, do not do extensive search via link map * _GNU_SOURCE instead of GNU_SOURCE * Clang-tidy fix |
||
|
|
1bb94add11 |
Fix rocprofiler_iterate_callback_tracing_kind_operation_args for HIP compiler callbacks (#532)
* Fix HIP compiler iterate args
- `include/rocprofiler-sdk/hip/api_args.h`
- replace struct fields named "f" with "func"
- replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/callback_tracing.cpp`
- iterate_args for HIP compiler table
- `lib/rocprofiler-sdk/registration.cpp`
- fix warning about roctx num_tables
- `lib/rocprofiler-sdk/hip/hip.def.cpp`
- replace struct fields named "f" with "func"
- replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/{hip,hsa,marker}/utils.hpp`
- improve `stringize_impl`
- `lib/rocprofiler-sdk/hsa/code_object.cpp`
- remove stale commented out code
- `lib/rocprofiler-sdk/hsa/queue_controller.*`
- destory_queue -> destroy_queue
- `tests/tools/json-tool.cpp`
- improve parallelism in tool_tracing_callback
- serialize the marker api args
- only invoke rocprofiler_iterate_callback_tracing_kind_operation_args in exit phase
- `samples/counter_collection/CMakeLists.txt`
- reduce timeout on tests to 120 seconds
* Update lib/rocprofiler-sdk/hsa/utils.hpp
- disable dereference of double pointer in stringize_impl
* Update lib/common
- indirection_level in mpl.hpp
- stringize_arg.hpp
* Rework rocprofiler_iterate_callback_tracing_kind_operation_args
- provide more information in rocprofiler_callback_tracing_operation_args_cb_t
- support specifying the dereference level to account for output paramters
|
||
|
|
875f53b608 |
Correlation ID Retirement + misc (#527)
* Correlation ID Retirement
- include/rocprofiler-sdk/buffer_tracing.h
- add rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- include/rocprofiler-sdk/fwd.h
- ROCPROFILER_BUFFER_TRACING_CORRELATION_ID_RETIREMENT
- lib/rocprofiler-sdk/buffer_tracing.cpp
- kind string for correlation id retirement
- lib/rocprofiler-sdk/buffer.hpp
- emplace returns bool
- lib/rocprofiler-sdk/registration.cpp
- pass lib_instance to copy_table functions
- lib/rocprofiler-sdk/context/context.*
- update correlation_id struct
- make ref_count private
- {get,add,sub}_ref_count() functions
- sub_ref_count() performs correlation id retirement
- use stack for "latest" thread-local correlation id
- lib/rocprofiler-sdk/hip/hip.*
- migrate to new {get,add,sub}_ref_count() for correlation ids
- return in iterate_args
- handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/hsa.*
- migrate to new {get,add,sub}_ref_count() for correlation ids
- return in iterate_args
- handle table instance in copy_table
- lib/rocprofiler-sdk/marker/marker.*
- migrate to new {get,add,sub}_ref_count() for correlation ids
- return in iterate_args
- handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/async_copy.cpp
- migrate to new {get,add,sub}_ref_count() for correlation ids
- handle table instance in async_copy_init / async_copy_save
- lib/rocprofiler-sdk/hsa/queue.cpp
- migrate to new {get,add,sub}_ref_count() for correlation ids
- tweak to external correlation id mapping in WriteInterceptor
- tests/async-copy-tracing/validate.py
- check retired_correlation_ids
- tests/common/serialization.hpp
- support rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- tests/kernel-tracing/validate.py
- check retired_correlation_ids
- tests/common/CMakeLists.txt
- perfetto external project
- tests/common/perfetto.hpp
- perfetto categories + aliases
- add_perfetto_annotation
- metaprogramming helpers
- tests/tools/CMakeLists.txt
- link to tests-perfetto
- tests/tools/json-tool.cpp
- demangling functions
- serialization of marker API callback args
- reduce parallel bottleneck in tool_tracing_callback
- support correlation id retirement
- Multiple threads for buffers
- Support ROCPROFILER_TOOL_CONTEXTS_EXCLUDE env variable
- write_perfetto() function
* Update tests/rocprofv3/tracing/validate.py
- tweak test_hsa_api_trace
* Update PTL submodule
- fixes for data race during destruction of task
* Update lib/rocprofiler-sdk/buffer.*
- unique_buffer_vec_t uses std::unique_ptr instead of allocator::unique_static_ptr_t
* Reduce timeouts in counter collection samples [skip ci]
* Update tests/tools/json-tool.cpp
- tweak demangle(string_view, int*) -> demangle(string_view, int&)
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- move sub_ref_count() to later in async_copy_handler to delay retirement slightly more
|
||
|
|
3f39339926 |
API Tracing Overhaul (#437)
* Update include/rocprofiler-sdk/hsa/*
- split HSA API IDs into separate enumerations
- add support for finalize ext table
* Update include/rocprofiler-sdk/hip/*
- remove compiler_api_args.h
- rocprofiler_hip_api_args_t contains all for HIP runtime and HIP compiler
- ROCPROFILER_HIP_API_ID_ -> ROCPROFILER_HIP_RUNTIME_API_ID_
* Update include/rocprofiler-sdk/marker/table_api_id.h
- ROCPROFILER_MARKER_API_TABLE_ID_ -> ROCPROFILER_MARKER_TABLE_ID_
* Update include/rocprofiler-sdk/*/table_api_id.h
- table_api_id.h -> table_id.h
* Update include/rocprofiler-sdk/*/table_api_id.h
- table_api_id.h -> table_id.h
* Update include/rocprofiler-sdk/fwd.h
- ROCPROFILER_CALLBACK_TRACING_HSA_API split into 4 enum values:
- ROCPROFILER_CALLBACK_TRACING_HSA_CORE_API
- ROCPROFILER_CALLBACK_TRACING_HSA_AMD_EXT_API
- ROCPROFILER_CALLBACK_TRACING_HSA_IMAGE_EXT_API
- ROCPROFILER_CALLBACK_TRACING_HSA_FINALIZE_EXT_API
- ROCPROFILER_BUFFER_TRACING_HSA_API split into 4 enum values:
- ROCPROFILER_BUFFER_TRACING_HSA_CORE_API
- ROCPROFILER_BUFFER_TRACING_HSA_AMD_EXT_API
- ROCPROFILER_BUFFER_TRACING_HSA_IMAGE_EXT_API
- ROCPROFILER_BUFFER_TRACING_HSA_FINALIZE_EXT_API
- rocprofiler_callback_tracing_code_object_operation_t renamed to rocprofiler_code_object_operation_t (more consistent)
- doxygen updates
* Update include/rocprofiler-sdk/buffer_tracing.h
- improved doxygen comments
- removed unused rocprofiler_buffer_tracing_queue_scheduling_record_t
- removed unused rocprofiler_buffer_tracing_correlation_record_t
* Update include/rocprofiler-sdk/callback_tracing.h
- removed rocprofiler_callback_tracing_hip_compiler_api_data_t
- rocprofiler_hip_api_args_t and rocprofiler_hip_compiler_api_args_t were combined
- rocprofiler_hsa_api_retval_t and rocprofiler_hsa_compiler_api_retval_t were combined
* Update lib/rocprofiler-sdk/hsa/*
- utils.hpp
- formatters for hsa_ext_program_t and hsa_ext_control_directives_t
- defines.hpp
- removed variadic macros from lib/common/defines.hpp
- HSA_API_META_DEFINITION, HSA_API_INFO_DEFINITION_0, HSA_API_INFO_DEFINITION_V specialize on table id
- async_copy.cpp
- ROCPROFILER_HSA_API_ID_* -> ROCPROFILER_HSA_AMD_EXT_API_ID_*
- add table id to templates
- improve async_copy_fini
- hsa.hpp
- add hsa_table_id_lookup
- add hsa_domain_info
- add table id to templates
- add copy_table function
- hsa.cpp
- add table id to templates
- require hsa tables to be trivial and standard layout
- remove set_data_args specialization for hsa_amd_memory_async_copy_rect
- implement copy_table function
- hsa.def.cpp
- update enums
* Update lib/rocprofiler-sdk/hip/*
- defines.hpp
- use lib/common/defines.hpp
- add hip_table_id_lookup to HIP_API_TABLE_LOOKUP_DEFINITION
- hip.hpp
- hip_table_id_lookup
- template iterate_args on table id
- templated copy_table and update_table
- hip.cpp
- replaced api_id_bounds with hip_domain_info
- templated iterate_args on table id
- templated copy_table and update_table
* Update lib/rocprofiler-sdk/marker/*
- defines.hpp
- use lib/common/defines.hpp
- marker.cpp
- updated enums
- marker.def.cpp
- updated enums
* Update lib/rocprofiler-sdk/tests
- common.hpp
- ROCPROFILER_CALL_EXPECT
- callback_data_ext
- update get_callback_tracing_names with new enums
- update get_buffer_tracing_names with new enums
- external_correlation.cpp
- support new HSA API enums
- intercept_table.cpp
- use test/common.hpp
- update to new HSA API enums
- registration.cpp
- support new HSA API enums
- naming.cpp
- validation for all get_ids(), get_names(), name_by_id(), id_by_name(), etc.
* Update lib/common
- defines.hpp
- Move IMPL_DETAIL_FOR_EACH_NARG, GET_ADDR_MEMBER_FIELDS, and GET_NAMED_MEMBER_FIELDS here
- used by HSA, HIP, and Marker
- static_object.hpp
- is_trivial_standard_layout static constexpr member function
- suppress register_static_dtor when is_trivial_standard_layout
* Update lib/rocprofiler-sdk/hsa/code_object.*
- name_by_id
- id_by_name
- get_names
- get_ids
* Update lib/rocprofiler-sdk/registration.cpp
- Update rocprofiler_set_api_table for HSA
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- Update for new HSA enums
- Rework to use switch statement
- rocprofiler_query_callback_tracing_kind_operation_name
- rocprofiler_iterate_callback_tracing_kind_operations
- rocprofiler_iterate_callback_tracing_kind_operation_args
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- Update for new HSA enums
- Rework to use switch statement
- rocprofiler_query_buffer_tracing_kind_operation_name
- rocprofiler_iterate_buffer_tracing_kind_operations
* Update lib/rocprofiler-sdk-tool
- helper.cpp
- update get_buffer_id_names with new enums
- update get_callback_id_names with new enums
- tools.cpp
- update to use new HSA enums
* Update samples/common
- added call_stack.hpp
- source_location struct
- call_stack_t alias
- print_call_stack function
- added name_info.hpp
- utils for getting buffer/callback domain and operation names
* Update samples/api_buffered_tracing/client.cpp
- use samples/common/call_stack.hpp
- use samples/common/name_info.hpp
- update for new HSA enums
* Update samples/api_callback_tracing/client.cpp
- use samples/common/call_stack.hpp
- use samples/common/name_info.hpp
- update for new HSA enums
* Update tests/tools/json-tool.cpp
- update for new HSA enums
* Update tests/rocprofv3/tracing/validate.py
- update for new HSA domain names
* Update samples/counter_collection/main.cpp
- reduce number of kernels to 50,000 since 200,000 causes issues with thread sanitizer
|
||
|
|
9efafc4d23 |
Split ROCTx API tables and update intercept table API (#421)
* Update include/rocprofiler-sdk
- buffer_tracing.h
- fix doxygen for rocprofiler_buffer_tracing_hip_api_record_t
- update doxygen for rocprofiler_buffer_tracing_marker_api_record_t
- remove unused marker_id field
- fwd.h
- Split ROCPROFILER_CALLBACK_TRACING_MARKER_API into ROCPROFILER_CALLBACK_TRACING_MARKER_{CORE,CONTROL,NAME}_API
- Split ROCPROFILER_BUFFER_TRACING_MARKER_API into ROCPROFILER_BUFFER_TRACING_MARKER_{CORE,CONTROL,NAME}_API
- split rocprofiler_runtime_library_t into rocprofiler_runtime_library_t and rocprofiler_intercept_table_t
- after split of ROCTx into 3 tables, specifying rocprofiler_at_internal_thread_create became confusing
* Update include/rocprofiler-sdk-roctx/api_trace.h
- Split into three tables: core, control, and name
- core: what it sounds like
- control: functions for controling the profiler
- name: functions for giving resources names
* Update lib/rocprofiler-sdk-roctx/roctx.cpp
- modifications following split into multiple tables
* Update lib/rocprofiler-sdk/marker/*
- modifications following split of ROCTx API into multiple intercept tables
* Update lib/rocprofiler-sdk/tests
- common.hpp
- add enums to get_callback_tracing_names() and get_buffer_tracing_names()
- intercept_table.cpp
- update test to use rocprofiler_intercept_table_t (and enums) instead of rocproifler_runtime_library_t
- update OR combos tested
- roctx.cpp
- updates following split of ROCTx API table into multiple tables
- use simplified specification of control API
* Update lib/rocprofiler-sdk
- buffer_tracing.cpp
- Updates for ROCPROFILER_BUFFER_TRACING_MARKER_{CORE,CONTROL,NAME}_API enum values
- callback_tracing.cpp
- Updates for ROCPROFILER_CALLBACK_TRACING_MARKER_{CORE,CONTROL,NAME}_API enum values
- intercept_table.hpp
- notify_runtime_api_registration -> notify_intercept_table_registration
- intercept_table.cpp
- updates for new rocprofiler_intercept_table_t enum and new ROCTx tables
- registration.cpp
- updates for new rocprofiler_intercept_table_t enum and new ROCTx tables
- updates for notify_runtime_api_registration -> notify_intercept_table_registration
* Update lib/rocprofiler-sdk-tool
- helper.cpp
- Updates for new enums in get_callback_id_names() and get_buffer_id_names()
- tool.cpp
- migrate to new enums for split ROCTx tables
- use simplified split for control table vs. core+name tables
* Update samples/{api_callback_tracing,intercept_table}
- intercept_table/client.cpp
- rocprofiler_runtime_library_t -> rocprofiler_intercept_table_t
- api_callback_tracing/client.cpp
- Updates for new enums in get_callback_id_names()
- use simplified split for control table vs. core+name tables
- migrate to new enums for split ROCTx tables
* Update tests
- rocprofv3/tracing/validate.py
- handle new marker domain names
- tools/json-tool.cpp
- Updates for new enums in get_callback_id_names() and get_buffer_id_names()
- use simplified split for control table vs. core+name tables
- migrate to new enums for split ROCTx tables
* Update tests/rocprofv3/tracing/CMakeLists.txt
- fix FAIL_REGULAR_EXPRESSION for rocprofv3-test-trace-execute
* Update lib/rocprofiler-sdk-tool/{output_file,tool}.*
- logging in output_file dtor
- support stdout/stderr
* Update lib/common/container/record_header_buffer.hpp
- reduce probability of is_empty() returning true while emplace is happening
* Update lib/rocprofiler-sdk-tool/tool.cpp
- logging for buffered_tracing_callback
- counter collection uses CSV encoder
* Update bin/rocprofv3
- remove -i flag from help menu
|
||
|
|
c641749fe6 |
HIP API Tracing (#357)
* Update include/rocprofiler-sdk/hip*
- updates for intercept table
* Update lib/common/units.hpp
- clang-tidy fixes
* Add lib/rocprofiler-sdk/hip
- tracing implementation for the HIP intercept table
* Update source/lib/rocprofiler-sdk/CMakeLists.txt
- add_subdirectory(hip)
* Update source/lib/rocprofiler-sdk/hsa
- offset function in hsa_api_info<Idx>
- remove report_activity, set_callback
- Tweak HSA_API_TABLE_LOOKUP_DEFINITION
* Update lib/rocprofiler-sdk/hip
- rocprofiler::hip::copy_table
- stringize_impl print dereferenced pointers when possible
* Update lib/rocprofiler-sdk/hsa/utils.hpp
- stringize_impl print dereferenced pointers when possible
* Update lib/rocprofiler-sdk/tests/intercept_table.cpp
- remove failures for intercepting HIP API tables
* Update include/rocprofiler-sdk/fwd.h
- add ROCPROFILER_HIP_RUNTIME_LIBRARY (== ROCPROFILER_HIP_LIBRARY)
- add ROCPROFILER_HIP_COMPILER_LIBRARY
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_query_buffer_tracing_kind_operation_name
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_iterate_buffer_tracing_kind_operations
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_query_callback_tracing_kind_operation_name
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operations
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operation_args
* Update lib/rocprofiler-sdk/intercept_table.cpp
- support HipDispatchTable and HipCompilerDispatchTable
* Update lib/rocprofiler-sdk/internal_threading.cpp
- Support ROCPROFILER_HIP_COMPILER_LIBRARY
* Update lib/rocprofiler-sdk/registration.cpp
- Support "hip" and "hip_compiler" in rocprofiler_set_api_table
- Added some extra logging
* Update samples/api_{buffered,callback}_tracing
- Modifications to demonstrate HIP API tracing
* Update tests/kernel-tracing
- Modifications to handle/test HIP API tracing
* Separate HIP tracing from HIP compiler tracing
* Fix installation of include/rocprofiler-sdk/hip/*
- add compiler and table headers to install
* Fixes to HIP interception
- hip_api_trace.hpp was updated a bit
- removed hipGetDeviceProperties (generic)
- added hipGetDevicePropertiesR0600
- added hipGetDevicePropertiesR0000
- removed hipRegisterTracerCallback
- reordered hipCreateChannelDesc, hipExtModuleLaunchKernel, hipHccModuleLaunchKernel
- added hipDrvGraphAddMemsetNode
- static asserts in hsa_api_info ensuring ordering of pointers
* Update lib/rocprofiler-sdk/hip/hip.*
- use size_t instead of rocprofiler_hip_table_api_id_t as non-type template parameter (smaller binary)
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)
* Update lib/rocprofiler-sdk/hsa/hsa.*
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)
* Update test/kernel-tracing/validate.py
- does not expect any hip_api_traces until libamdhip.so actually starts using rocprofiler-register
* Update tests/tools/json-tool.cpp
- fix context associated with "HIP_API_CALLBACK"
* Update external/CMakeLists.txt
- move misc variables to top of CMakeLists.txt so they apply to all external subprojects
- BUILD_TESTING (OFF)
- BUILD_SHARED_LIBS (OFF)
- BUILD_OBJECT_LIBS (OFF)
- BUILD_STATIC_LIBS (ON)
- CMAKE_POSITION_INDEPENDENT_CODE (ON)
- CMAKE_VISIBILITY_INLINES_HIDDEN (ON)
- CMAKE_CXX_VISIBILITY_PRESET (hidden)
- disable using libunwind in glog
* Update lib/rocprofiler-{sdk,sdk-tool}/CMakeLists.txt
- remove explicit setting of SKIP_BUILD_RPATH
* Update CMakeLists.txt
- set high-level CMAKE_BUILD_RPATH and CMAKE_INSTALL_RPATH_USE_LINK_PATH
* Update tests/CMakeLists.txt
- include(GNUInstallDirs)
* Update samples/CMakeLists.txt
- include(GNUInstallDirs)
* Update include/rocprofiler-sdk/hip/{compiler_api,api}_args.h
- remove extern "C" due to incompatibility b/t empty struct in C (size 0) vs. empty struct in C++ (size 1)
* Update lib/rocprofiler-sdk/hip/details/ostream.hpp
- clang-tidy fixes
* Update cmake/rocprofiler_linting.cmake
- add a feature for clang tidy exe
* Update lib/rocprofiler-sdk/hip/hip.cpp
- use recursion instead of fold expression due to clang-tidy errors (maximum nesting level exceeded)
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- fix merge
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- fix merge
* Update bin/rocprofv3
- args for marker, HIP runtime, and HIP compiler tracing
* Update tests/apps/simple-transpose
- use roctx
* Update tests/rocprofv3/tracing
- validate marker API data
* Update lib/rocprofiler-sdk-tool
- support for HIP runtime, HIP compiler, marker API
* Update queue/queue_controller/registration/utility
- call hsa::queue_controller_fini() during finalization
- add a yield function to common/utility.hpp
- implements a thread yield + sleep
- add a sync function to Queue class
- add a iterate_queues member function to QueueController
- this is used to sync each queue during queue_controller_fini()
* Fix data races: queue/context/stable_vector
- stable_vector::emplace_back returns reference
- correlation id map uses stable_vector
- queue_info_session has explicit fields for queue id, hsa agent, rocp agent
- use hsa::get_table() in AsyncSignalHandler
- WriteInterceptor does not use TLS for context array
* Update lib/rocprofiler-sdk/hsa/hsa.*
- static object for API subtables
- accessors for API subtables
- google tests for HSA API subtables
* Update lib/rocprofiler-sdk/hsa/{queue,async_copy}.cpp
- use HSA subtable accessors
* Update rocprofiler_memcheck and CI workflow
- use GCC 13 instead of GCC 11 due to suspected false positives in thread sanitizer
- GCC 13 uses libtsan.so.2
* Update CI workflow
* Update lib/rocprofiler-sdk/counters/{metrics,counters}
- fix possibly dangling reference to a temporary from gcc-13
* Update thread-sanitizer-suppr.txt
- Ignore data races originating in hsa-runtime library
* Update cmake/rocprofiler_memcheck.cmake
- Deduce the sanitizer library to preload by compiling an application and extracting the linked sanitizer library
* Update tests/rocprofv3/tracing/CMakeLists.txt
- add csv files to REQUIRED_FILES and ATTACH_ON_FAIL in validate test
* Update lib/common/container/record_header_buffer.hpp
- fix data race identified by gcc v13 and libtsan.so.2
* Update hip API id, args, and def
- remove hipDrvGraphAddMemsetNode (not part of ROCm 6.0
* Update lib/common/container/record_header_buffer.hpp
- fix deadlock in save/read/reset
* Update source/docs/CMakeLists.txt
- remove COMMAND_ERROR_IS_FATAL ANY to allow for printing of stdout/stderr
* Update lib/rocprofiler-sdk/hip/details/ostream.hpp
- remove overloads for HIP_MEMSET_NODE_PARAMS
* Update docs/CMakeLists.txt
- use find_program for shell instead of hardcoded /bin/bash
|
||
|
|
1f4cf1aa39 |
Tools update (#397)
* Srnagara/tool counters collect (#331) * Adding counter collection capability to tools * Adding counter collection feature to tools * Adding counter collection capability to tools * Fixing merge down issues * Small tool fixes for build + prevent profile realloc * Reproducing the counter name query issue in buffered callback * Minor fix for init order + sample that directly uses sdk-tool for debug purposes * Adding a temporary fix to print the counter names * Fixing the output file name and reverting the changes of caching the profile config * Fixing SGPR_Count value * cleaning up debug prints * Adding header to counter collection file * Adding kernel filtering support * Remove threading * Cleaning up the code * Removing redundant prints * Revert "Remove threading" This reverts commit 05c58fb9de826e92cf8d2e3d1c31d5578525dcb4. * Revert "Cleaning up the code" This reverts commit 1d964882bf2396dee8ad020cbb6c83b36e0674e9. * Changing the tools code to align with init-order fix * cmake formatting (cmake-format) (#335) Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> * source formatting (clang-format v11) (#336) Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> * Adding support for async memory copy * source formatting (clang-format v11) (#391) Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> * Fixing header typo * Fixing tool_fini * Replaceing the direction and kind fields values with description * Update lib/rocprofiler-sdk-tool/helper.cpp - Remove use of VLA * Update lib/rocprofiler-sdk-tool/tool.cpp - Formatting * Migrate common/config.* to rocprofiler-sdk-tool * Update lib/rocprofiler-sdk-tool/tool.cpp - fix clang-tidy issues * source formatting (clang-format v11) (#392) Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com> * Update lib/common/mpl.hpp - is_string_type / is_string_type_impl for deducing if type is a string type * Update include/rocprofiler-sdk/fwd.h - ROCPROFILER_BUFFER_TRACING_MEMORY_COPY_NONE starts at zero * Update lib/rocprofiler-sdk/hsa/async_copy.* - functions for operation ids and names * Update lib/rocprofiler-sdk/buffer_tracing.cpp - support iterating and getting names for ROCPROFILER_BUFFER_TRACING_MEMORY_COPY * Update lib/rocprofiler-sdk-tool/config.* - env ROCPROFILER_ prefix -> ROCPROF_ prefix - add support for memory copy tracing, counter collection, etc. * Update lib/rocprofiler-sdk-tool/helper.* - removed TracerFlushRecord - removed cxa_demangle (use one in common library) - removed GetCounterNames (handled in config) - removed GetKernelNames (handled in config) * Add lib/rocprofiler-sdk-tool/output_file.* - separate out get_output_stream function and output_file struct from tool.cpp * Add lib/rocprofiler-sdk-tool/csv.hpp - write_csv_entry automatically quotes strings - csv_encoder struct enforces correct number of columns * Update lib/rocprofiler-sdk-tool/CMakeLists.txt - add new files * Update lib/rocprofiler-sdk-tool/tool.cpp - update construction of output_file class - add kernel_symbol_data for serializing kernel trace data - use config instead of env lookups - optimize counter collection profile config lookup/creation * Update bin/rocprofv3 - rocprofv3 --help exits with 0 (as it should) - command-line arg for memory copy tracing - command-line arg for mangled kernels - command-line arg for truncated kernels - env ROCPROFILER_ prefix -> env ROCPROF_ prefix * Update tests/async-copy-tracing/validate.py - update test_async_copy_direction to new enum values * Update tests/kernel-tracing/validate.py - update test_async_copy_direction to new enum values * Update tests/tools/json-tool.cpp - add ROCPROFILER_BUFFER_TRACING_MEMORY_COPY to supported buffer_name_info * Update samples/counter_collection/{CMakeLists.txt,main.cpp} - remove counter-collection-sdk-tool * Update .github/workflows/docs.yml - fix paths triggering running the workflow --------- Co-authored-by: Benjamin Welton <bewelton@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com> * adding counter collection support * Adding counter collection test * changing directory structure of counter collection tests * Fixing test path for rocprofv3 * Adding hsa-tracing basic test * cmake formatting (cmake-format) (#362) Co-authored-by: bgopesh <bgopesh@users.noreply.github.com> * counter collection tests drop2 * fixing hsa-trace test for rocprofv3 path * python formatting (black) (#371) Co-authored-by: bgopesh <bgopesh@users.noreply.github.com> * both counter colleciton and tracing should work together * Fixing rocprofv3 path * Attempt to fix Segfault with AddressSanitizer * fixing sanitizer segfault * Update rocprofv3 * Update lib/rocprofiler-sdk-tool/README.md - update env variables * Update lib/rocprofiler-sdk/buffer_tracing.cpp - return ROCPROFILER_STATUS_BUFFER_NOT_FOUND if buffer tracing service is configured with invalid buffer * Update lib/rocprofiler-sdk-tool/tool.cpp - designated hsa API trace buffer * Update tests/hsa-tracing/CMakeLists.txt - Fix environment * Update rocprofv3 - do not override HSA_TOOLS_LIB - support ROCPROF_PRELOAD - LD_PRELOAD librocprofiler-sdk.so * Restructure tests directory - move all rocprofv3 integration tests into subfolder * Update cmake/Templates/rocprofiler-sdk/config.cmake.in - create rocprofiler-sdk::rocprofv3 cmake target * Update tests/rocprofv3/hsa-tracing - improve validate.py - convert input to dict via csv.DictReader * Update tests/apps/CMakeLists.txt - fix build rpath for simple-transpose * Update cmake/rocprofiler_memcheck.cmake - prefer libtsan.so.0 * Update tests/rocprofv3/hsa-tracing - move to tests/rocprofv3/tracing - include kernel tracing and memory copy tracing * Update lib/rocprofiler-sdk-tool/tool.cpp - normalize "_ID" vs. "_Id" in CSV column names (use "_Id") * Update lib/rocprofiler-sdk/buffer.{hpp,cpp} - change signature of buffer::get_buffers() - buffer::get_buffers() uses static_object * Update lib/rocprofiler-sdk/context/context.cpp - update usage of buffer::get_buffers() - now returns pointer * Update lib/rocprofiler-sdk/tests/buffer.cpp - update to change for signature of buffer::get_buffers() * Update tests/rocprofv3/tracing/CMakeLists.txt - use %argt% with -d argument * Update lib/rocprofiler-sdk-tool/tool.cpp - use atexit for finalization * Update tests/rocprofv3/tracing/CMakeLists.txt - tweaked name of tests * Update lib/rocprofiler-sdk/hsa/async_copy.* - async_copy_fini + reference counting signals * Update lib/rocprofiler-sdk/registration.cpp - invoke hsa::async_copy_fini() to prevent data race on signals --------- Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com> Co-authored-by: Benjamin Welton <bewelton@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com> Co-authored-by: bgopesh <bgopesh@users.noreply.github.com> |
||
|
|
21dd088c8e |
ROCTx Library Tracing (#390)
* Update include/rocprofiler-sdk/marker/*
- Update rocprofiler_marker_api_args_t for all API functions
- Add ROCPROFILER_MARKER_API_ID_roctxGetThreadId to rocprofiler_marker_api_id_t
* Update include/rocprofiler-sdk/marker/api_args.h
- fix include
* Update lib/common/mpl.hpp
- is_pair
- is_type_complete_v
* Update include/rocprofiler-sdk/marker/*
- fix rocprofiler_marker_api_retval_t
- add roctxGetThreadId to rocprofiler_marker_api_args_t
- fix type in enum: HsaDevice -> HsaAgent
- add table_api_id.h
* Update include/rocprofiler-sdk/marker.h
- include marker/table_api_id.h
* Update include/rocprofiler-sdk/buffer_tracing.h
- Buffer marker tracer records have begin and end timestamp
* Add lib/rocprofiler-sdk/marker
- tracing implementation for marker (roctx) library
* Update include/rocprofiler-sdk/{buffer_tracing,marker/table_api_id}.h
- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- support for ROCPROFILER_BUFFER_TRACING_MARKER_API
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- support for ROCPROFILER_CALLBACK_TRACING_MARKER_API
* Update lib/rocprofiler-sdk/intercept_table.cpp
- template instantiation for notify_runtime_api_registration
* Update lib/rocprofiler-sdk/registration.cpp
- enable roctx in rocprofiler_set_api_table
* Update lib/rocprofiler-sdk/marker/marker.cpp
- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t
* Update lib/rocprofiler/tests for roctx testing
- add roctx.cpp
- unit tests for roctx callback and buffer tracing
- support marker API in get_{buffer,callback}_tracing_names()
* Update lib/common/logging.cpp
- logging initialized message mentions env variable
* Update lib/common/mpl.hpp
- NOLINT for misc-definitions-in-headers
* Update lib/rocprofiler-sdk/tests/CMakeLists.txt
- include LD_LIBRARY_PATH in rocprofiler-lib-tests-shared tests
* Update lib/rocprofiler-sdk/registration.cpp
- client_library_vec_t is now vector of option<client_library>
- enables resetting the client_library after finalization
- removed acquiring registration lock when invoke_client_finalizers called via atexit
- this was causing some lock-order-inversion warnings (potential deadlock)
* Update lib/rocprofiler-sdk/agent.cpp
- model name for agent supports spaces
* Update tests/common/serialization.hpp
- add serialization support for marker tracing data structures
* Update tests/apps
- Add ROCTx markers into reproducible-runtime and transpose
* Update tests/tools/json-tools.cpp
- add marker tracing support
- remove strdup (no longer necessary)
* Update tests/kernel-tracing/validate.py
- validate marker API tracing data
* Update tests/async-copy-tracing/validate.py
- validate marker API tracing data
* Update cmake for load path resolution during testing
* Update tests/async-copy-tracing/CMakeLists.txt
- fix test LD_LIBRARY_PATH
* Update cmake/Templates/rocprofiler-sdk-roctx/config.cmake.in
- fix constructing rocprofiler-sdk-roctx::rocprofiler-sdk-roctx
|
||
|
|
dc8b8aa448 |
Cleanup + logging env variable (#387)
* [CP] Update tests/common/serialization.hpp
- remove duplication in rocprofiler_callback_tracing_code_object_load_data_t
* [CP] Update lib/rocprofiler-sdk/tests
- create common.hpp
- update registration.cpp to use common.hpp
* [CP] Add lib/common/logging.{hpp,cpp}
- generic init_logging function
* [CP] Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- remove excess logging
* [CP] Update lib/rocprofiler-sdk/registration.cpp
- use common::init_logging(...)
- enforce ROCPROFILER_REGISTER_FORCE_LOAD in rocprofiler_force_configure
- logging updates in rocprofiler_set_api_table
* Update include/rocprofiler-sdk/buffer_tracing.h
- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t
* Update lib/common/utility.hpp
- remove active_capacity_gate
* Update lib/rocprofiler-sdk/tests/common.hpp
- fix get_{callback,buffer}_tracing_names()
* Update lib/rocprofiler-sdk/counters/xml/{basic,derived}_counters.xml
- add entries for gfx1102
|
||
|
|
936816f762 |
Async memory copy tracing (#317)
* Update samples/api_buffered_tracing/client.cpp
- support ROCPROFILER_BUFFER_TRACING_MEMORY_COPY
* Update include/rocprofiler-sdk/{buffer_tracing,fwd}.h
- update rocprofiler_buffer_tracing_memory_copy_record_t
- add ROCPROFILER_BUFFER_TRACING_MEMORY_COPY_HOST_TO_HOST to rocprofiler_memory_copy_operation_t
* Update lib/rocprofiler-sdk/context/context.*
- get_registered_contexts functions (local copy)
* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp
- include some memory allocations and memory copies for better testing
* Update tests/common/serialization.hpp
- update serialization save function for rocprofiler_buffer_tracing_memory_copy_record_t
* Update lib/rocprofiler-sdk/hsa/hsa.*
- remove stale set_callback / activity_functor_t code
- forward decl hsa_api_meta
- template struct hsa_api_func for getting function return type and args
* Update tests/kernel-tracing/validate.py
- enforce memory_copies data size
- test timestamps in memory copies data
- improve internal and external correlation id validation
* Update lib/rocprofiler-sdk/hsa/defines.hpp
- HSA_API_META_DEFINITION macro
* Update lib/rocprofiler/hsa/rocprofiler-sdk/hsa/hsa.def.cpp
- HSA_API_META_DEFINITION specializations for async copy functions
* Add lib/rocprofiler-sdk/hsa/async_copy.{hpp,cpp}
- implements buffer memory tracing
* Update lib/rocprofiler-sdk/registration.cpp
- invoke rocprofiler::hsa::async_copy_init
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- logging improvements
- improve hsa <-> rocp agent mapping
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- load original signal in async signal handler before store_screlease
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- use store_relaxed instead of store_screlease
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- logging
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- logging
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- misc changes
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- misc changes
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- misc changes
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- return function pointer instead of lambda
* Update reproducible-runtime.cpp
- device sync
* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp
- use *Async variants of hipMalloc and hipMemcpy
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- populate async data properly
* Update tests/kernel-tracing/validate.py
- verification of async copy direction
* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp
- temporarily disable async memcpy functions
* Create tests/tools
- directory containing tool libraries used for collecting data in integration tests
* Update tests/kernel-tracing
- remove kernel-tracing-test-tool library (now rocprofiler-sdk-json-tool)
- update cmake, validate.py, conftest.py accordingly
* Add tests/async-copy-tracing
- integration test validating async copy tracing in transpose example
* Update tests/CMakeLists.txt
- updates for restructuring
* Revert tests/apps/reproducible-runtime
- restore code to semi-original state (no memory copying)
* Update tests/async-copy-tracing/validate.py
- fix comment in test_async_copy_direction
* Fix building tests against installation
|
||
|
|
6b374b8e68 |
Improve static singleton memory safety (#316)
* Update GitHub links * Update samples/api_buffered_tracing/client.cpp - check if initialized before forcing initialization * Add lib/common/static_object.* - template class for creating a static allocation in the binary which has all the properties of a heap allocated singleton but does not trigger leak sanitizers * Update include/rocprofiler-sdk/internal_threading.h - document return values * Update lib/rocprofiler-sdk/internal_threading.cpp - return codes from rocprofiler_create_callback_thread and rocprofiler_assign_callback_thread - use common::static_object for thread-pool object * Update lib/rocprofiler-sdk/agent.cpp - use common::static_object to store array of strings and their hashes * Update lib/rocprofiler-sdk/hsa/code_object.cpp - use common::static_object to store array of strings and their hashes to ensure strings exist until termination * Update lib/rocprofiler-sdk/registration.cpp - use common::static_object to store status and client libraries - update return values for rocprofiler_set_api_table * Update lib/rocprofiler-sdk/hsa/hsa.cpp - check registration::get_fini_status() in hsa_api_impl::functor<Idx>(args...) * Update lib/rocprofiler-sdk/context/context.cpp - using common::static_object for correlation id map |
||
|
|
9a0c84efa6 |
Use -sdk suffix and reset VERSION to 0.0.0 (#263)
* Fix find_package(rocprofiler) in build tree * Move include/rocprofiler to include/rocprofiler-sdk * Update include/CMakeLists.txt - add_subdirectory(rocprofiler-sdk) * Move lib/rocprofiler to lib/rocprofiler-sdk * Move lib/rocprofiler-tool to lib/rocprofiler-sdk-tool * Update lib/CMakeLists.txt - add_subdirectory(rocprofiler-sdk) - add_subdirectory(rocprofiler-sdk-tool) * Update lib/rocprofiler-sdk/CMakeLists.txt * Rename rocprofiler-tool to rocprofiler-sdk-tool * Replace include rocprofiler/ with include rocprofiler-sdk/ * Replace include lib/rocprofiler/ with include lib/rocprofiler-sdk/ * Set VERSION to 0.0.0 and finish install to rocprofiler-sdk * More fixes for rocprofiler -> rocprofiler-sdk - fix issue with rocprofiler-sdk-config.cmake.in - fix counters xml install path * Fix documentation generation * Create rocprofiler_LIB_ROCPROFILER_SDK_DIR for build tree * cmake formatting (cmake-format) (#264) Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> |