* Fix stream duplication and fixed tests
* Added comments to explain stream.cpp code, change stream nullptr check to occur in update table to prevent readding null stream, simplified hip-streams bin file code, add destroyStreams to hip-streams bin file code
* Removed roctx from CMakeLists.txt
* Updated documentation
* Fix documentation
* Removed update_table for HIP compiler table and updated stream.cpp to remove support for HIP compiler table
* Added runtime initialization check for HIP
* Changed tool name, working on fixing memory management
* Added context for counter collection kernel rename combination
* Changed name from map to set and changed description
* Fix documentation description for group-by-queue
* Merged memory copy and kernel operations onto a single track when on the same stream
* Updated perfetto output to remove hardware information from track name to merge all memory copy and kernel operations on the same stream to the same track:
* Most pr comments addressed
* Added filter for counter collection and removed kernel buffer tracing hack
* Added PR comment fixes
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Add Stack IDs
* Add memcpy test
* Add async corr id record
* Async events use `rocprofiler_async_correlation_id_t`
* Sync events use `rocprofiler_correlation_id_t`
* Update ATT to use asnyc IDs
* Review comments
* Fix navi3 kernel tracing
- conditional aql::set_profiler_active_on_queue only when counter collection is registered
* Update changelog
* Update following name change
* Add kernel profiling time info to counter collection records
- lib/rocprofiler-sdk/kernel_dispatch
- added profiling_time.{hpp,cpp}
- restructured tracing.cpp
- updated queue.cpp AsyncSignalHandler
- gets kernel dispatch profiling time and passes to dispatch_complete and signal callbacks
- structured some header includes to reduce cyclic include probability
- originally, including kernel_dispatch/tracing.hpp in hsa/queue.hpp created a lot of cyclic includes
* Fix kernel_dispatch.cpp includes
* Fix kernel_dispatch.cpp
- include <cstring>
- replace use of ROCPROFILER_HSA_AMD_EXT_API_ID_NONE with ROCPROFILER_KERNEL_DISPATCH_LAST
- missing-new-line CI job: ensures all source files end with new line
- logging updates
- add new line to the end of many files
- fix header include ordering is misc places
- transition to use hsa::get_core_table() and hsa::get_amd_ext_table() in various places instead of making copies
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- Optimize WriteInterceptor to eliminate extra barrier packets causing gaps between kernels in kernel tracing
- increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue
- misc logging improvements
- source/lib/rocprofiler-sdk/counters/agent_profiling.cpp
- increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue
- tests/rocprofv3/hsa-queue-dependency/CMakeLists.txt
- add TIMEOUT for rocprofv3-test-hsa-multiqueue-execute
* Test fix for AQLProfile changes
AQL profile recently changed where the ordering
of packets now needs to be read then stop (instead
of stop and read previously). This change supports
the newer model.
* Update source/lib/rocprofiler-sdk/counters/dispatch_handlers.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* test barrier
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* cmake formatting (cmake-format) (#188)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#189)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: design of the pc sampling data struct; guarding parts of code that uses ROCr marker packets
* source formatting (clang-format v11) (#191)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#192)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: shadow variable fix
* pcs: fix for compiler errors reported by CI/CD
* source formatting (clang-format v11) (#193)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: docs fix; samples uses rocprofiler::rocprofiler library
* cmake formatting (cmake-format) (#195)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: client in samples folder fixed
* pcs: client requires rocprofiler package as dependency
* pcs: client uses single context
* source formatting (clang-format v11) (#196)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: client using single buffer; no buffer destroy in client
* pcs: client::setup explicitly called from the example
* pcs: rocprofiler_pc_sample_record_t updated
* pcs: fixed init of external correlation id
* source formatting (clang-format v11) (#198)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: remove outdated files; update CMakeLists
* cmake formatting (cmake-format) (#212)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: using rocprofiler_agent_id_t
* pcs: Removing trailing whitespaces
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* source formatting (clang-format v11) (#214)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: mapping agent_id to the agent
* source formatting (clang-format v11) (#215)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: const while iterating over agents
* source formatting (clang-format v11) (#216)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: calling get_buffer instead of get_buffers
* pcs: workgroup typo
* pcs: documentation for the public PC sampling API
* pcs: queue_cb_t signature adaptation
* pcs: mocks removed
* pcs: updating HsaApiTable with HSA/ROCr PC sampling API
* pcs: querying available PC sampling configs through IOCTL
* pcs: create the PCS session in IOCTL
* pcs: first actual PC samples delivered to the rocprofiler's client :)
* pcs: works with marker packet too
* pcs: using HSA table to call pc sampling related functions
* pcs: using ioctl instead of kfd in naming
* pcs: configuration service test fixed
* pcs: sample processing test fixed
* pcs: marker packet macro wrapper removed
* pcs: marker packet is part of the rocprofiler_packet union
* pcs: one fixme added
* pcs: client that uses pc-sampling and code obj tracing
* pcs: client that supprts PC sampling and code obj tracing refactored
* pcs: show more info for each PC sample
* pcs: hex output for the samples that do not belong to the matmul kernel
* pcs: querying avail configuration happens immediately before configuring
* pcs: hsa_ven_amd_pcs_create_from_id renamed
* pcs: using hsa_stop; accessing a buffer by id from parser
* pcs: includes reworked, tests returned to life
* pcs: rocrofiler dir removed as outdated
* cmake formatting (cmake-format) (#271)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#272)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: some warnings fixed
* source formatting (clang-format v11) (#273)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#274)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: show MI200 relevant information in the sample
* pcs: queue cb fixed; rocr.h include fixed
* source formatting (clang-format v11) (#296)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: getting hsa_agent and the doorbell_id from hsa_queue
* source formatting (clang-format v11) (#297)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: correlation ID logic fixed
* source formatting (clang-format v11) (#303)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: pure pc sampling example fixed
* source formatting (clang-format v11) (#307)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#308)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: interval value if the PC sampling is already configured
* pcs: ROCPROFILER_STATUS_ERROR_PC_SAMPLING_ALREADY_CONFIGURED
New status code if another process configured PC sampling service with different configuration.
Samples are extended to consider this case and retry if it happens.
* pcs: hsa_amd_queue_get_info mocked in tests
* source formatting (clang-format v11) (#328)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs (tests): query configs after configuring service
* source formatting (clang-format v11) (#329)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: sample checks workgroup_id_* and wave_id
* source formatting (clang-format v11) (#330)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs samples: running samples on the device 0
* pcs: kfd_ioctl updated
* pcs: ioctl config struct changed fields names
* pcs: status when PC sampling is configured by another process is renamed
* pcs: HSA PC sampling API table fixed
* pcs: tmp hack to be able to use HSA pc sampling table
* source formatting (clang-format v11) (#443)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs service use CIDs generated by HIP API tracing service
* source formatting (clang-format v11) (#455)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#456)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: CID manager
* pcs: explicit flush with no delivered data executes retirement logic
* source formatting (clang-format v11) (#464)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: rocprofiler_query_pc_sampling_agent_configurations docs update
* source formatting (clang-format v11) (#465)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: rocprofiler_configure_pc_sampling_service docs update
* pcs: explicit sync introduced in PCSCIDManager
* pcs: new logic for retiring CIDs in PC sampling service documented
* pcs: queue interception cb signature updated
* source formatting (clang-format v11) (#471)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: if no agents supports PC sampling, fail gracefully
* elaborating when KFD returns EBUSY and EEXIST
* pcs: the second PC sampling examples fails gracefully
* code samples use only single kernel for now
* pcs: CID manager refactored
* source formatting (clang-format v11) (#481)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: ioctl update
* source formatting (clang-format v11) (#531)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs:code sample to test PC sampling applied on concurrent kernels
* source formatting (clang-format v11) (#533)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: pc sampling strest test included
* cmake formatting (cmake-format) (#539)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#540)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: standalone benchmark
* cmake formatting (cmake-format) (#555)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: glance in external correlation IDs
* source formatting (clang-format v11) (#557)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* another change in ioctl interface
* pcs: update queue interceptor callbacks and samples accroding to the agent 0 version
* source formatting (clang-format v11) (#611)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: avoid running problematic PC sampling test
* pcs: guarding tests not to fail on architectures not supporting PC sampling
* source formatting (clang-format v11) (#617)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: check IOCTL version prior to each KFD call
* pcs: ioctl refactoring
* pcs: PC sampling service increases the ref_count of the correlation ID of the kernel dispatch
* cmake formatting (cmake-format) (#631)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#632)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: PC sampling service provides external correlation IDs
* source formatting (clang-format v11) (#644)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: use rocprofiler_dim3_t for workgrou_ip
* source formatting (clang-format v11) (#645)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: minor fixes
* pcs: updating the documentation for the pc sampling API functions
* pcs: api table and queue controller fix
* pcs: don't generate marker packets for the agent if PC sampling is not configured on it
* pcs: multi-GPU and single-GPU clients
* source formatting (clang-format v11) (#700)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: warning and errors fixed
* source formatting (clang-format v11) (#702)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: clang compiler errors and warnings fixed
* source formatting (clang-format v11) (#716)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: const reference in cid manager
* source formatting (clang-format v11) (#717)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: const & func in manager explicit
* pcs: test to cover creating PC sampling service of agent that does not exist
* pcs: generate marker packets if service is active
* source formatting (clang-format v11) (#719)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: refactoring hsa_adapter; use the correlation_id->thread_idx
* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/utils.cpp
* Update utils.cpp
* moving pc-sampling tests and samples to pc-sampling label
* Format fix
* pcs: use configured instead of active service
* Update source/lib/rocprofiler-sdk/pc_sampling/service.cpp
* pcs: ensure configuring PC sampling on the HSA level is called only once
* pcs: minor fix
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* pcs: refactoring IOCTL integration
* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: reverting back what bot doubled
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: retesting the bot
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: why bot fails on this IOCTL status
* pcs: why failing on <vector>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: returning commits removed by bot
* pcs: formatting locally
* pcs: clients are flushing buffers inside the tool_fini
* pcs: sync function in public API
* pcs: sync prior to unloading the code object
* pcs: sync function requires context
* pcs: client uses CID retirement service
* pcs: test for flusing internal ROCr buffers
* pcs: source formatting
* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: code samples refactoring
* pcs: public API header refactored
* pcs: rocprofiler_buffer_flush drains internal PC sampling buffers too
* pcs: remove unnecessary functions
* pcs: do not call hsa's copytables
* pcs: include reordering
* pcs: using ROCP_ERROR inside PC sampling implementation
* pcs: pc_sampling sample uses ostream instean of printfs
* pcs: pc_sampling_codeobj tracing using ostream instead of prints
* pcs: registering once for interceptor callbacks
* pcs: do not generate internal CIDs if not in debug mode
* pcs: rebasing fixed; missing external correlation IDs
* pcs: code formatting
* enable kernel tracing service to receive external correlation IDs
* pcs: using ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL
* pcs: polishing parser
* formatting
* updating parser to use workgroup_id
* kfd_ioctl.h extracted in details folder
* refactoring
* pcs: preparing to generate code object information
* flush internal buffers prior to unloading code object
* pcs: generating marker records
* pcs: wrap code_object's shutdown function
* ROCR_VISIBLE_DEVICES and HIP_VISISBLE_DEVICES unsupported at the moment
* documenting the ignorance of ROCR/HIP_VISIBLE_DEVICES
* pcs: separate structs for code object loading/unloading markers
* pcs: inst_pkt_t changed the namespace
* pcs: removing wrapper around the shutdown function
* pcs: size in record field
* pcs: documentation refactoring + typdefs
* renaming PCSAgentConfig to PCSAgentSession
* pcs: service does not keep a pointer to the context
* pcs: static assertions related to the versioning
* pcs: rocprofiler_pc_sampling_configuration_t size field
* pcs: report API unimplemented unleass explicitly enabled
* pcs: skip tests if KFD does not support PC sampling
* pcs: if ROCr hides some devices, no PC samples will be delivered for it
* pcs: hip error check after kernel launch
* formatting
* removing PCS info from agent.h
* fix based on review
* Update continuous integration workflow
- use mi200 runner for code coverage (supports PC sampling)
- split sanitizer jobs across navi3, vega20, and mi300
* Updating pc sampling test labels
* ROCP_PC_SAMPLING_ENABLED env in CI
* ROCP_PC_SAMPLING_ENABLED for all CI mi200 jobs
* Rearrange sanitizer assignments
* fixes according to review
* removed unused functions
* pcs: rocprofiler_agent_id_t instead of handle as a key in map
* Update source/lib/rocprofiler-sdk/context/context.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* removing drm_fd from the agent.h
* pcs: removing one sample due to complexity
* pcs: refactoring sample
* simplifying sample
* new lines
* Improve queue_control enable intercepter logic
* Update lib/rocprofiler-sdk/hsa/types.hpp
- handle amd_ext size for HSA 1.12.0
* ROCP_PC_SAMPLING_ENABLED -> ROCPROFILER_PC_SAMPLING_BETA_ENABLED
* Update hsa_adapter.cpp
- anonymous namespace + remove debug
* parser update
* Apply suggestions from code review
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
Co-authored-by: vlaindic <vladimir.indic@amd.com>
Co-authored-by: vlaindic <vlaindic@amd.com>
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* public codeobj info
* Restructure code object source code file layout
* Update get_unloaded_code_objects + add iterate_loaded_code_objects
* Remove get_unloaded_code_objects from visible internal API
- iterate_loaded_code_objects + functor which filters on the hsa_executable_t effectively reproduces this behavior
* Whitespace removal
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Remove -Wno-missing-field-initializers
- Compiler errors if missing field initializers
* Update lib/rocprofiler-sdk/counters/evaluate_ast.cpp
- copy over dispatch ID in perform_reduction/evaluate
* Added first ATT API
* Finalizing thread trace API
* Fixing more rebase conflicts
* Added codeobj disassembly sample
* Fixing merge issues with rebase [2]
* Adding ATT packets
* Implemented thread trace intercept
* Moved codeobj parser to same repo as rocprofiler
* Moved thread trace to new API
* Fixing merge conflicts
* Fixing more merge conflicts
* Adding thread trace packet reuse
* Merged aql_profile_v2 headers
* Linked ATT sample to aqlprofile
* Updated decoder to include non-loaded codeobjs
* Implemented ISA decoder into ATT sample
* Added marker_id to vaddr
* Updating aql_profile_v2 API to memcpy
* Updating thread trace API to include 64bit markers. Using the result of ISA matching.
* Added instruction type and cycles summary
* Updated sample with selection of kernel by kernel_object
* Added option to copy from memory kernels
* Moved tool_data in thread_trace to dynamic alloc
* Restoring hsa.cpp
* Fixed ATT sample crash. General improvements.
* Moved codeobj library to outside src/
* Updated license header
* Moved codeobj_capture to camelcase
* Solving some more merge conflicts
* Update samples/advanced_thread_trace/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update samples/advanced_thread_trace/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update samples/code_object_isa_decode/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/thread_trace/CMakeLists.txt
* Removing unused parameter check
* Adding const to isEmpty
* Removing unused warning
* Adding libdw-dev to requirements
* Running clang-format
* Commenting out new aql calls
* Clang format
* Unused variable fix
* Adding codeobj-decoder coverage
* Commenting out threadtrace
* Update samples/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* P
* WOverloaded
* Addressing clang-tidy
* Virtual destructor on ttracer class
* Corr id
* Fixing code source format
* Update CMakeLists.txt
* Build fixes
* Update source/lib/rocprofiler-sdk-codeobj/code_object_track.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix shadowing
* Update CMakeLists.txt
* Update samples/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Add debug printing for write interceptor injected packets
Adds debug printing for write interceptor injected
packets. All packets that pass through the write
intercepter while enabled will be printed.
Only executes/prints when the environment variable
GLOG_v is set to 2 or higher (otherwise it is a no-op
and the expression is not evaluated).
* source formatting (clang-format v11) (#675)
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
* Changes on fmt location
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
* Handle hsa_queue_destroy after finalization
- fixes issue where hsa_queue_destroy(...) is invoked after rocprofiler-sdk has finalized
- hsa::get_queue_controller() returns pointer
- if queue controller is a null pointer, skip invoking QueueController::destroy_queue
* Update HIP/HSA/marker update_table logging
* Update rocprofv3 tests
- remove HSA_TOOLS_LIB env variable
- remove setting ROCPROFILER_LOG_LEVEL env variable
- add timeouts to tests which are missing them
* Disable thread sanitizer deadlock detection
* Update CI workflow
- rename vega20-ubuntu job to core-ci
- enable navi32 in core-ci and sanitizers
* Update run-ci.py
- set gcovr html medium and high threshold
* Update lib/rocprofiler-sdk/hsa/queue_controller.cpp
- remove this capture from enable/disable serialization
* Update lib/rocprofiler-sdk/hsa/{hsa_barrier,profile_serializer}.*
- hsa_barrier::set_barrier accepts const-ref to queue map
- profile_serializer::enable and profile_serializer::disable accept const-ref to queue map
* Logging for HIP/HSA/marker/profile_serializer
* Logging for HIP/HSA/marker/queue_controller
* Improve test_retired_correlation_ids asserts
* Fix tests/counter-collection/validate.py
- scale expected SQ_WAVES counter value based on warp size of GPU
* Tweak github comment for code coverage
* Remove gcovr html high/medium threshold args
* Fix tests/counter-collection/validate.py
- round before casting to int in test_counter_values
* operator bool for profile_serializer
- only wait on CV if profile_serializer is used
* Logging updates (profile_serializer + code_object)
* Update counter-collection validate.py
* QueueController does not wait on CV if finalizing/finalized
* Update CI workflow
- remove navi32 from core job
* Improve HIP/HSA/marker tracing get_functor/functor
- remove lambda wrapper around functor
* Update lib/rocprofiler-sdk/hsa/queue_controller.cpp
- do not acquire cvmutex lock during finalization
* Update lib/rocprofiler-sdk/hsa/hsa_barrier.*
- move ctor and dtor to implementation
- skip signal store screlease and destroy if already finalized
* Update CI workflow
- remove navi32 runners
* bwelton fixes for hangs
* CMake improvements + simplified demangle
- remove amd-comgr from common target (and thus removed from roctx DT_NEEDED)
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Improve error checks related to context create/start/stop/is_valid
* Bump version to 0.2.1
* Track number of kernels associated with correlation id
- add atomic kernel counter variable to context::correlation_id
* Update lib/rocprofiler-sdk/hsa/queue.cpp
- apply the +/- kernel count
* Moved tests/apps to tests/bin
* Renamed cmake project in tests/bin
* Update samples
- Use ROCPROFILER_DEFAULT_FAIL_REGEX
- tweaks to stdout messages
* Update tests
- Use ROCPROFILER_DEFAULT_FAIL_REGEX
* Add tests/lib
- libraries with HIP code
* Update PTL submodule
- remove atexit delete of thread_id_map
* Update cmake/rocprofiler_options.cmake
- Set ROCPROFILER_DEFAULT_FAIL_REGEX
* Update common lib: env + logging
- improved customization of logging settings
- default to disabling logging to files
- install failure handler for rocprofv3
- set_env support in environment.*
* Add lib/rocprofiler-sdk/shared_library.cpp
- shared library constructor
* Update lib/rocprofiler-sdk-tool/tool.cpp
- destructor thread safety
- convert callback_name_info and buffered_name_info to pointers
- install failure handler for logging
* Add tests/bin/hip-in-libraries
- hip-in-libraries is an exe which uses two shared libraries where each shared library contains HIP kernels
- used for testing deadlocking within __hipRegisterFatBinary
* Update bin/rocprofv3
- reorganized the env variables
- use exec to launch command
- set ROCPROFILER_LIBRARY_CTOR=1
* Add tests/rocprofv3/tracing-hip-in-libraries
- uses hip-in-libraries exe for exe which uses shared libraries to launch HIP kernels
* Update bin/rocprofv3
- fix counter collection (no exec)
* Update lib/rocprofiler-sdk-tool/tool.cpp
- replace "Kernel-Name" with "Kernel_Name"
* Update lib/rocprofiler-sdk/registration.cpp
Use RTLD_LOCAL instead of RTLD_GLOBAL for env libraries
* Update tests/rocprofv3
- replace "Kernel-Name" with "Kernel_Name"
* Update tests
- vector-ops (bin) stream syncs + runs with 4 queues per device
- improve counter-collection/input1 validation
- rocprofv3/tracing-hip-in-libraries does not do sys-trace
- improved validation script for tracing-hip-in-libraries
- updated dispatch_callback in json-tool.cpp following reworking of prototypes for counter collection
* Update samples/counter_collection
- updated dispatch_callback(s) and record_callback(s) following reworking of prototypes
* Update bin/rocprofv3
- reorganized help menu
- added options for sub-HSA tables
- added --hip-runtime-trace
- changed --hip-trace to include --hip-compiler-trace
* Update lib/rocprofiler-sdk-tool
- improved kernel filtering
- removed arch_vgpr, accum_vgpr, sgpr code (in rocprofiler-sdk)
- fixed issue with counter-collection w/o tracing
- added support for fine grained HSA API tracing
- removed directly linking to HSA-runtime
* Update lib/rocprofiler-sdk/agent.cpp
- rocp_agents != hsa_agents is non-fatal when ROCPROFILER_BUILD_CI=OFF (CMake option)
* GPR (vector and scalar) info in kernel symbol data
- rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t contains general purpose register info
* Header include order fix
- Include repo headers first
- Third party library headers next
- standard library headers last
* Update dispatch profiling public API
- introduce rocprofiler_profile_counting_dispatch_data_t
- change signature of rocprofiler_profile_counting_dispatch_callback_t and rocprofiler_profile_counting_record_callback_t
- provide rocprofiler_user_data_t pointer in dispatch callback
- provide rocprofiler_user_data_t value (from dispatch cb) in record callback
* Update tests/bin/CMakeLists.txt
- fix add_subdirectory(hip-in-libraries) order
* Update VERSION
- bump to 0.2.0 in prep for AFAR
* Update lib/rocprofiler-sdk/hsa/queue.cpp
- switch using the kernel_pkt.kernel_dispatch.completion_signal instead of interrupt signal for getting the dispatch time
* Update tests/kernel-tracing/validate.py
- add verification of total runtime collected in test_timestamps
- the sum of the runtime of all the kernels in reproducible-runtime should be ~1 sec +/- 10%
* Remove include/rocprofiler-sdk/rocprofiler_plugin.h
* Update CI workflow
- update actions/cache@v3 -> v4
- actions/cache/save@v3 -> v4
- thollander/actions-comment-pull-request@v2 -> v2.4.3
* Update pytest.ini
- change default options to one that is more verbose
* Update tests/kernel-tracing/CMakeLists.txt
- skip test_total_runtime when Address or Thread Sanitizer enabled
- overhead skews the results
* Update tests/kernel-tracing/validate.py
- separate test_total_runtime test
* Update include/rocprofiler-sdk/hip*
- updates for intercept table
* Update lib/common/units.hpp
- clang-tidy fixes
* Add lib/rocprofiler-sdk/hip
- tracing implementation for the HIP intercept table
* Update source/lib/rocprofiler-sdk/CMakeLists.txt
- add_subdirectory(hip)
* Update source/lib/rocprofiler-sdk/hsa
- offset function in hsa_api_info<Idx>
- remove report_activity, set_callback
- Tweak HSA_API_TABLE_LOOKUP_DEFINITION
* Update lib/rocprofiler-sdk/hip
- rocprofiler::hip::copy_table
- stringize_impl print dereferenced pointers when possible
* Update lib/rocprofiler-sdk/hsa/utils.hpp
- stringize_impl print dereferenced pointers when possible
* Update lib/rocprofiler-sdk/tests/intercept_table.cpp
- remove failures for intercepting HIP API tables
* Update include/rocprofiler-sdk/fwd.h
- add ROCPROFILER_HIP_RUNTIME_LIBRARY (== ROCPROFILER_HIP_LIBRARY)
- add ROCPROFILER_HIP_COMPILER_LIBRARY
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_query_buffer_tracing_kind_operation_name
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_iterate_buffer_tracing_kind_operations
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_query_callback_tracing_kind_operation_name
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operations
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operation_args
* Update lib/rocprofiler-sdk/intercept_table.cpp
- support HipDispatchTable and HipCompilerDispatchTable
* Update lib/rocprofiler-sdk/internal_threading.cpp
- Support ROCPROFILER_HIP_COMPILER_LIBRARY
* Update lib/rocprofiler-sdk/registration.cpp
- Support "hip" and "hip_compiler" in rocprofiler_set_api_table
- Added some extra logging
* Update samples/api_{buffered,callback}_tracing
- Modifications to demonstrate HIP API tracing
* Update tests/kernel-tracing
- Modifications to handle/test HIP API tracing
* Separate HIP tracing from HIP compiler tracing
* Fix installation of include/rocprofiler-sdk/hip/*
- add compiler and table headers to install
* Fixes to HIP interception
- hip_api_trace.hpp was updated a bit
- removed hipGetDeviceProperties (generic)
- added hipGetDevicePropertiesR0600
- added hipGetDevicePropertiesR0000
- removed hipRegisterTracerCallback
- reordered hipCreateChannelDesc, hipExtModuleLaunchKernel, hipHccModuleLaunchKernel
- added hipDrvGraphAddMemsetNode
- static asserts in hsa_api_info ensuring ordering of pointers
* Update lib/rocprofiler-sdk/hip/hip.*
- use size_t instead of rocprofiler_hip_table_api_id_t as non-type template parameter (smaller binary)
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)
* Update lib/rocprofiler-sdk/hsa/hsa.*
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)
* Update test/kernel-tracing/validate.py
- does not expect any hip_api_traces until libamdhip.so actually starts using rocprofiler-register
* Update tests/tools/json-tool.cpp
- fix context associated with "HIP_API_CALLBACK"
* Update external/CMakeLists.txt
- move misc variables to top of CMakeLists.txt so they apply to all external subprojects
- BUILD_TESTING (OFF)
- BUILD_SHARED_LIBS (OFF)
- BUILD_OBJECT_LIBS (OFF)
- BUILD_STATIC_LIBS (ON)
- CMAKE_POSITION_INDEPENDENT_CODE (ON)
- CMAKE_VISIBILITY_INLINES_HIDDEN (ON)
- CMAKE_CXX_VISIBILITY_PRESET (hidden)
- disable using libunwind in glog
* Update lib/rocprofiler-{sdk,sdk-tool}/CMakeLists.txt
- remove explicit setting of SKIP_BUILD_RPATH
* Update CMakeLists.txt
- set high-level CMAKE_BUILD_RPATH and CMAKE_INSTALL_RPATH_USE_LINK_PATH
* Update tests/CMakeLists.txt
- include(GNUInstallDirs)
* Update samples/CMakeLists.txt
- include(GNUInstallDirs)
* Update include/rocprofiler-sdk/hip/{compiler_api,api}_args.h
- remove extern "C" due to incompatibility b/t empty struct in C (size 0) vs. empty struct in C++ (size 1)
* Update lib/rocprofiler-sdk/hip/details/ostream.hpp
- clang-tidy fixes
* Update cmake/rocprofiler_linting.cmake
- add a feature for clang tidy exe
* Update lib/rocprofiler-sdk/hip/hip.cpp
- use recursion instead of fold expression due to clang-tidy errors (maximum nesting level exceeded)
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- fix merge
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- fix merge
* Update bin/rocprofv3
- args for marker, HIP runtime, and HIP compiler tracing
* Update tests/apps/simple-transpose
- use roctx
* Update tests/rocprofv3/tracing
- validate marker API data
* Update lib/rocprofiler-sdk-tool
- support for HIP runtime, HIP compiler, marker API
* Update queue/queue_controller/registration/utility
- call hsa::queue_controller_fini() during finalization
- add a yield function to common/utility.hpp
- implements a thread yield + sleep
- add a sync function to Queue class
- add a iterate_queues member function to QueueController
- this is used to sync each queue during queue_controller_fini()
* Fix data races: queue/context/stable_vector
- stable_vector::emplace_back returns reference
- correlation id map uses stable_vector
- queue_info_session has explicit fields for queue id, hsa agent, rocp agent
- use hsa::get_table() in AsyncSignalHandler
- WriteInterceptor does not use TLS for context array
* Update lib/rocprofiler-sdk/hsa/hsa.*
- static object for API subtables
- accessors for API subtables
- google tests for HSA API subtables
* Update lib/rocprofiler-sdk/hsa/{queue,async_copy}.cpp
- use HSA subtable accessors
* Update rocprofiler_memcheck and CI workflow
- use GCC 13 instead of GCC 11 due to suspected false positives in thread sanitizer
- GCC 13 uses libtsan.so.2
* Update CI workflow
* Update lib/rocprofiler-sdk/counters/{metrics,counters}
- fix possibly dangling reference to a temporary from gcc-13
* Update thread-sanitizer-suppr.txt
- Ignore data races originating in hsa-runtime library
* Update cmake/rocprofiler_memcheck.cmake
- Deduce the sanitizer library to preload by compiling an application and extracting the linked sanitizer library
* Update tests/rocprofv3/tracing/CMakeLists.txt
- add csv files to REQUIRED_FILES and ATTACH_ON_FAIL in validate test
* Update lib/common/container/record_header_buffer.hpp
- fix data race identified by gcc v13 and libtsan.so.2
* Update hip API id, args, and def
- remove hipDrvGraphAddMemsetNode (not part of ROCm 6.0
* Update lib/common/container/record_header_buffer.hpp
- fix deadlock in save/read/reset
* Update source/docs/CMakeLists.txt
- remove COMMAND_ERROR_IS_FATAL ANY to allow for printing of stdout/stderr
* Update lib/rocprofiler-sdk/hip/details/ostream.hpp
- remove overloads for HIP_MEMSET_NODE_PARAMS
* Update docs/CMakeLists.txt
- use find_program for shell instead of hardcoded /bin/bash
* Update lib/rocprofiler-sdk/context/context.*
- get_registered_contexts functions (local copy)
* Update lib/rocprofiler-sdk/hsa/{queue,queue_controller}.cpp
- remove ROCPROFILER_BUFFER_TRACING_MEMORY_COPY code
* Update tests/kernel-tracing/kernel-tracing.cpp
- move stop() and flush() in tool_fini to before reporting of sizes of data collected
* Update lib/rocprofiler-sdk/hsa/hsa.*
- remove stale set_callback / activity_functor_t code
* Update lib/rocprofiler-sdk/buffer.cpp
- full wait instead of returning busy when buffer is busy
- use task_group::join instead of task_group::wait to fully wait for tasks to finish (bug fix)
* Update lib/rocprofiler-sdk/agent.cpp
- support agent mapping for CPU agents
* Remove direct access to vector of registered contexts
* Fix find_package(rocprofiler) in build tree
* Move include/rocprofiler to include/rocprofiler-sdk
* Update include/CMakeLists.txt
- add_subdirectory(rocprofiler-sdk)
* Move lib/rocprofiler to lib/rocprofiler-sdk
* Move lib/rocprofiler-tool to lib/rocprofiler-sdk-tool
* Update lib/CMakeLists.txt
- add_subdirectory(rocprofiler-sdk)
- add_subdirectory(rocprofiler-sdk-tool)
* Update lib/rocprofiler-sdk/CMakeLists.txt
* Rename rocprofiler-tool to rocprofiler-sdk-tool
* Replace include rocprofiler/ with include rocprofiler-sdk/
* Replace include lib/rocprofiler/ with include lib/rocprofiler-sdk/
* Set VERSION to 0.0.0 and finish install to rocprofiler-sdk
* More fixes for rocprofiler -> rocprofiler-sdk
- fix issue with rocprofiler-sdk-config.cmake.in
- fix counters xml install path
* Fix documentation generation
* Create rocprofiler_LIB_ROCPROFILER_SDK_DIR for build tree
* cmake formatting (cmake-format) (#264)
Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>