* General fixes to ATT, packets and event ID retrieval
* Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- Optimize WriteInterceptor to eliminate extra barrier packets causing gaps between kernels in kernel tracing
- increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue
- misc logging improvements
- source/lib/rocprofiler-sdk/counters/agent_profiling.cpp
- increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue
- tests/rocprofv3/hsa-queue-dependency/CMakeLists.txt
- add TIMEOUT for rocprofv3-test-hsa-multiqueue-execute
The following changes are introduced:
- Use functions instead of macros.
- Verify the error code when querying KFD IOCTL version.
- Skip tests and samples if KFD IOCTL < 1.16 or PC Sampling IOCTL < 0.1.
* Incremental Counter Profile Creation
Adds support for incremental counter creation. How this functions is the
behavior of rocprofiler_create_profile_config has been changed.
rocprofiler_create_profile_config(rocprofiler_agent_id_t agent_id,
rocprofiler_counter_id_t* counters_list,
size_t counters_count,
rocprofiler_profile_config_id_t* config_id)
The behavior of this function now allows an existing config_id to be
supplied via config_id. The counters contained in this config will be
copied over and used as a base for a new config along with any counters
supplied in counters_list. The new config id is returned via config_id
and can be used in future dispatch/agent counting sessions.
A new config is created over modifying an existing config since there
is no gaurentee that the existing config isn't already in use. While we
could add locks (or other mutual exclusion properties) to check if its
in use and reject an update, the benefit from doing so is minor in
comparison to just creating a new config. This also side steps a common
pattern a tool may use to add additional counters at some point later on
during execution. Now they can do that without destroying the existing
config.
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* SWDEV-465322: Adding support for r Perfcounter SIMD Mask in ATT
* Apply suggestions from code review
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Adding unit tests
* Adding counters check for gfx9 and SQ block only
* Addressing review comments
* changing the struct size
* fixing header includes
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Dispatch table copy/update uses ROCP_TRACE instead of ROCP_INFO
* Update rocprofiler-sdk CMake config
- rocprofiler::rocprofiler is alias to rocprofiler-sdk::rocprofiler-sdk instead of other way around
* Prefer rocprofiler-sdk::rocprofiler-sdk over rocprofiler::rocprofiler
* Fix WITH_UNWIND for glog
- requires a value of "none" instead of boolean now
* Update include/rocprofiler-sdk/registration.h
- explicit struct names to permit forward decl
* Update include/rocprofiler-sdk/cxx/serialization.hpp
- ROCPROFILER_SDK_CEREAL_NAMESPACE_BEGIN and ROCPROFILER_SDK_CEREAL_NAMESPACE_END to enable customized namespace
* Test fix for AQLProfile changes
AQL profile recently changed where the ordering
of packets now needs to be read then stop (instead
of stop and read previously). This change supports
the newer model.
* Update source/lib/rocprofiler-sdk/counters/dispatch_handlers.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* test barrier
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
- move ROCPROFILER_SDK_HSA_PC_SAMPLING define from lib/rocprofiler-sdk/hsa/hsa.hpp to lib/rocprofiler-sdk/pc_sampling/defines.hpp
- Update lib/rocprofiler-sdk/pc_sampling/CMakeLists.txt to return if HSA version is < 1.14.0
- update various includes for "lib/rocprofiler-sdk/pc_sampling/defines.hpp"
* cmake formatting (cmake-format) (#188)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#189)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: design of the pc sampling data struct; guarding parts of code that uses ROCr marker packets
* source formatting (clang-format v11) (#191)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#192)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: shadow variable fix
* pcs: fix for compiler errors reported by CI/CD
* source formatting (clang-format v11) (#193)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: docs fix; samples uses rocprofiler::rocprofiler library
* cmake formatting (cmake-format) (#195)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: client in samples folder fixed
* pcs: client requires rocprofiler package as dependency
* pcs: client uses single context
* source formatting (clang-format v11) (#196)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: client using single buffer; no buffer destroy in client
* pcs: client::setup explicitly called from the example
* pcs: rocprofiler_pc_sample_record_t updated
* pcs: fixed init of external correlation id
* source formatting (clang-format v11) (#198)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: remove outdated files; update CMakeLists
* cmake formatting (cmake-format) (#212)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: using rocprofiler_agent_id_t
* pcs: Removing trailing whitespaces
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* source formatting (clang-format v11) (#214)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: mapping agent_id to the agent
* source formatting (clang-format v11) (#215)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: const while iterating over agents
* source formatting (clang-format v11) (#216)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: calling get_buffer instead of get_buffers
* pcs: workgroup typo
* pcs: documentation for the public PC sampling API
* pcs: queue_cb_t signature adaptation
* pcs: mocks removed
* pcs: updating HsaApiTable with HSA/ROCr PC sampling API
* pcs: querying available PC sampling configs through IOCTL
* pcs: create the PCS session in IOCTL
* pcs: first actual PC samples delivered to the rocprofiler's client :)
* pcs: works with marker packet too
* pcs: using HSA table to call pc sampling related functions
* pcs: using ioctl instead of kfd in naming
* pcs: configuration service test fixed
* pcs: sample processing test fixed
* pcs: marker packet macro wrapper removed
* pcs: marker packet is part of the rocprofiler_packet union
* pcs: one fixme added
* pcs: client that uses pc-sampling and code obj tracing
* pcs: client that supprts PC sampling and code obj tracing refactored
* pcs: show more info for each PC sample
* pcs: hex output for the samples that do not belong to the matmul kernel
* pcs: querying avail configuration happens immediately before configuring
* pcs: hsa_ven_amd_pcs_create_from_id renamed
* pcs: using hsa_stop; accessing a buffer by id from parser
* pcs: includes reworked, tests returned to life
* pcs: rocrofiler dir removed as outdated
* cmake formatting (cmake-format) (#271)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#272)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: some warnings fixed
* source formatting (clang-format v11) (#273)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#274)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: show MI200 relevant information in the sample
* pcs: queue cb fixed; rocr.h include fixed
* source formatting (clang-format v11) (#296)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: getting hsa_agent and the doorbell_id from hsa_queue
* source formatting (clang-format v11) (#297)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: correlation ID logic fixed
* source formatting (clang-format v11) (#303)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: pure pc sampling example fixed
* source formatting (clang-format v11) (#307)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#308)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: interval value if the PC sampling is already configured
* pcs: ROCPROFILER_STATUS_ERROR_PC_SAMPLING_ALREADY_CONFIGURED
New status code if another process configured PC sampling service with different configuration.
Samples are extended to consider this case and retry if it happens.
* pcs: hsa_amd_queue_get_info mocked in tests
* source formatting (clang-format v11) (#328)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs (tests): query configs after configuring service
* source formatting (clang-format v11) (#329)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: sample checks workgroup_id_* and wave_id
* source formatting (clang-format v11) (#330)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs samples: running samples on the device 0
* pcs: kfd_ioctl updated
* pcs: ioctl config struct changed fields names
* pcs: status when PC sampling is configured by another process is renamed
* pcs: HSA PC sampling API table fixed
* pcs: tmp hack to be able to use HSA pc sampling table
* source formatting (clang-format v11) (#443)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs service use CIDs generated by HIP API tracing service
* source formatting (clang-format v11) (#455)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#456)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: CID manager
* pcs: explicit flush with no delivered data executes retirement logic
* source formatting (clang-format v11) (#464)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: rocprofiler_query_pc_sampling_agent_configurations docs update
* source formatting (clang-format v11) (#465)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: rocprofiler_configure_pc_sampling_service docs update
* pcs: explicit sync introduced in PCSCIDManager
* pcs: new logic for retiring CIDs in PC sampling service documented
* pcs: queue interception cb signature updated
* source formatting (clang-format v11) (#471)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: if no agents supports PC sampling, fail gracefully
* elaborating when KFD returns EBUSY and EEXIST
* pcs: the second PC sampling examples fails gracefully
* code samples use only single kernel for now
* pcs: CID manager refactored
* source formatting (clang-format v11) (#481)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: ioctl update
* source formatting (clang-format v11) (#531)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs:code sample to test PC sampling applied on concurrent kernels
* source formatting (clang-format v11) (#533)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: pc sampling strest test included
* cmake formatting (cmake-format) (#539)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#540)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: standalone benchmark
* cmake formatting (cmake-format) (#555)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: glance in external correlation IDs
* source formatting (clang-format v11) (#557)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* another change in ioctl interface
* pcs: update queue interceptor callbacks and samples accroding to the agent 0 version
* source formatting (clang-format v11) (#611)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: avoid running problematic PC sampling test
* pcs: guarding tests not to fail on architectures not supporting PC sampling
* source formatting (clang-format v11) (#617)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: check IOCTL version prior to each KFD call
* pcs: ioctl refactoring
* pcs: PC sampling service increases the ref_count of the correlation ID of the kernel dispatch
* cmake formatting (cmake-format) (#631)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#632)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: PC sampling service provides external correlation IDs
* source formatting (clang-format v11) (#644)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: use rocprofiler_dim3_t for workgrou_ip
* source formatting (clang-format v11) (#645)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: minor fixes
* pcs: updating the documentation for the pc sampling API functions
* pcs: api table and queue controller fix
* pcs: don't generate marker packets for the agent if PC sampling is not configured on it
* pcs: multi-GPU and single-GPU clients
* source formatting (clang-format v11) (#700)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: warning and errors fixed
* source formatting (clang-format v11) (#702)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: clang compiler errors and warnings fixed
* source formatting (clang-format v11) (#716)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: const reference in cid manager
* source formatting (clang-format v11) (#717)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: const & func in manager explicit
* pcs: test to cover creating PC sampling service of agent that does not exist
* pcs: generate marker packets if service is active
* source formatting (clang-format v11) (#719)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: refactoring hsa_adapter; use the correlation_id->thread_idx
* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/utils.cpp
* Update utils.cpp
* moving pc-sampling tests and samples to pc-sampling label
* Format fix
* pcs: use configured instead of active service
* Update source/lib/rocprofiler-sdk/pc_sampling/service.cpp
* pcs: ensure configuring PC sampling on the HSA level is called only once
* pcs: minor fix
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* pcs: refactoring IOCTL integration
* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: reverting back what bot doubled
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: retesting the bot
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: why bot fails on this IOCTL status
* pcs: why failing on <vector>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: returning commits removed by bot
* pcs: formatting locally
* pcs: clients are flushing buffers inside the tool_fini
* pcs: sync function in public API
* pcs: sync prior to unloading the code object
* pcs: sync function requires context
* pcs: client uses CID retirement service
* pcs: test for flusing internal ROCr buffers
* pcs: source formatting
* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: code samples refactoring
* pcs: public API header refactored
* pcs: rocprofiler_buffer_flush drains internal PC sampling buffers too
* pcs: remove unnecessary functions
* pcs: do not call hsa's copytables
* pcs: include reordering
* pcs: using ROCP_ERROR inside PC sampling implementation
* pcs: pc_sampling sample uses ostream instean of printfs
* pcs: pc_sampling_codeobj tracing using ostream instead of prints
* pcs: registering once for interceptor callbacks
* pcs: do not generate internal CIDs if not in debug mode
* pcs: rebasing fixed; missing external correlation IDs
* pcs: code formatting
* enable kernel tracing service to receive external correlation IDs
* pcs: using ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL
* pcs: polishing parser
* formatting
* updating parser to use workgroup_id
* kfd_ioctl.h extracted in details folder
* refactoring
* pcs: preparing to generate code object information
* flush internal buffers prior to unloading code object
* pcs: generating marker records
* pcs: wrap code_object's shutdown function
* ROCR_VISIBLE_DEVICES and HIP_VISISBLE_DEVICES unsupported at the moment
* documenting the ignorance of ROCR/HIP_VISIBLE_DEVICES
* pcs: separate structs for code object loading/unloading markers
* pcs: inst_pkt_t changed the namespace
* pcs: removing wrapper around the shutdown function
* pcs: size in record field
* pcs: documentation refactoring + typdefs
* renaming PCSAgentConfig to PCSAgentSession
* pcs: service does not keep a pointer to the context
* pcs: static assertions related to the versioning
* pcs: rocprofiler_pc_sampling_configuration_t size field
* pcs: report API unimplemented unleass explicitly enabled
* pcs: skip tests if KFD does not support PC sampling
* pcs: if ROCr hides some devices, no PC samples will be delivered for it
* pcs: hip error check after kernel launch
* formatting
* removing PCS info from agent.h
* fix based on review
* Update continuous integration workflow
- use mi200 runner for code coverage (supports PC sampling)
- split sanitizer jobs across navi3, vega20, and mi300
* Updating pc sampling test labels
* ROCP_PC_SAMPLING_ENABLED env in CI
* ROCP_PC_SAMPLING_ENABLED for all CI mi200 jobs
* Rearrange sanitizer assignments
* fixes according to review
* removed unused functions
* pcs: rocprofiler_agent_id_t instead of handle as a key in map
* Update source/lib/rocprofiler-sdk/context/context.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* removing drm_fd from the agent.h
* pcs: removing one sample due to complexity
* pcs: refactoring sample
* simplifying sample
* new lines
* Improve queue_control enable intercepter logic
* Update lib/rocprofiler-sdk/hsa/types.hpp
- handle amd_ext size for HSA 1.12.0
* ROCP_PC_SAMPLING_ENABLED -> ROCPROFILER_PC_SAMPLING_BETA_ENABLED
* Update hsa_adapter.cpp
- anonymous namespace + remove debug
* parser update
* Apply suggestions from code review
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
Co-authored-by: vlaindic <vladimir.indic@amd.com>
Co-authored-by: vlaindic <vlaindic@amd.com>
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Perfetto submodule
* include/rocprofiler-sdk/cxx/perfetto.hpp
- adapted from tests/common/perfetto.hpp
- updated json-tool to use <rocprofiler-sdk/cxx/perfetto.hpp>
* Update include/rocprofiler-sdk/cxx
- add details/delimit.hpp
- add details/join.hpp
- extend details/mpl.hpp
- extend details/operators.hpp
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- update MEMORY_COPY direction names
* Preliminary perfetto support
* Update lib/rocprofiler-sdk-tool/generatePerfetto.cpp
- fix getting roctx msg vs. buffer operation name
* Temporary variable restructuring
* Perfetto patches after rebasing onto main
* Revert lib/rocprofiler-sdk/hsa/async_copy.cpp
- revert name
* Update lib/rocprofiler-sdk-tool/generatePerfetto.cpp
- fix ReadTrace
* Update tests/bin/hip-in-libraries
- sleep_for
* Support PFTRACE output format option in rocprofv3
* Change perfetto logging
* Update rocprofv3 tests to generate pftrace output
* Minor tweak to json-tool.cpp
* Update requirements.txt for perfetto testing
* Fix data race on amount_read in generatePerfetto.cpp
* Add testing for pftrace output
- relatively simple testing which verifies that the pftrace file has the same number of entries as JSON data for HIP/HSA/marker/kernel/memory_copy
* Fix import in perfetto_reader.py
* Fix data race in generatePerfetto.cpp