* cmake formatting (cmake-format) (#188)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#189)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: design of the pc sampling data struct; guarding parts of code that uses ROCr marker packets
* source formatting (clang-format v11) (#191)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#192)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: shadow variable fix
* pcs: fix for compiler errors reported by CI/CD
* source formatting (clang-format v11) (#193)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: docs fix; samples uses rocprofiler::rocprofiler library
* cmake formatting (cmake-format) (#195)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: client in samples folder fixed
* pcs: client requires rocprofiler package as dependency
* pcs: client uses single context
* source formatting (clang-format v11) (#196)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: client using single buffer; no buffer destroy in client
* pcs: client::setup explicitly called from the example
* pcs: rocprofiler_pc_sample_record_t updated
* pcs: fixed init of external correlation id
* source formatting (clang-format v11) (#198)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: remove outdated files; update CMakeLists
* cmake formatting (cmake-format) (#212)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: using rocprofiler_agent_id_t
* pcs: Removing trailing whitespaces
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* source formatting (clang-format v11) (#214)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: mapping agent_id to the agent
* source formatting (clang-format v11) (#215)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: const while iterating over agents
* source formatting (clang-format v11) (#216)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: calling get_buffer instead of get_buffers
* pcs: workgroup typo
* pcs: documentation for the public PC sampling API
* pcs: queue_cb_t signature adaptation
* pcs: mocks removed
* pcs: updating HsaApiTable with HSA/ROCr PC sampling API
* pcs: querying available PC sampling configs through IOCTL
* pcs: create the PCS session in IOCTL
* pcs: first actual PC samples delivered to the rocprofiler's client :)
* pcs: works with marker packet too
* pcs: using HSA table to call pc sampling related functions
* pcs: using ioctl instead of kfd in naming
* pcs: configuration service test fixed
* pcs: sample processing test fixed
* pcs: marker packet macro wrapper removed
* pcs: marker packet is part of the rocprofiler_packet union
* pcs: one fixme added
* pcs: client that uses pc-sampling and code obj tracing
* pcs: client that supprts PC sampling and code obj tracing refactored
* pcs: show more info for each PC sample
* pcs: hex output for the samples that do not belong to the matmul kernel
* pcs: querying avail configuration happens immediately before configuring
* pcs: hsa_ven_amd_pcs_create_from_id renamed
* pcs: using hsa_stop; accessing a buffer by id from parser
* pcs: includes reworked, tests returned to life
* pcs: rocrofiler dir removed as outdated
* cmake formatting (cmake-format) (#271)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#272)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: some warnings fixed
* source formatting (clang-format v11) (#273)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#274)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: show MI200 relevant information in the sample
* pcs: queue cb fixed; rocr.h include fixed
* source formatting (clang-format v11) (#296)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: getting hsa_agent and the doorbell_id from hsa_queue
* source formatting (clang-format v11) (#297)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: correlation ID logic fixed
* source formatting (clang-format v11) (#303)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: pure pc sampling example fixed
* source formatting (clang-format v11) (#307)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#308)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: interval value if the PC sampling is already configured
* pcs: ROCPROFILER_STATUS_ERROR_PC_SAMPLING_ALREADY_CONFIGURED
New status code if another process configured PC sampling service with different configuration.
Samples are extended to consider this case and retry if it happens.
* pcs: hsa_amd_queue_get_info mocked in tests
* source formatting (clang-format v11) (#328)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs (tests): query configs after configuring service
* source formatting (clang-format v11) (#329)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: sample checks workgroup_id_* and wave_id
* source formatting (clang-format v11) (#330)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs samples: running samples on the device 0
* pcs: kfd_ioctl updated
* pcs: ioctl config struct changed fields names
* pcs: status when PC sampling is configured by another process is renamed
* pcs: HSA PC sampling API table fixed
* pcs: tmp hack to be able to use HSA pc sampling table
* source formatting (clang-format v11) (#443)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs service use CIDs generated by HIP API tracing service
* source formatting (clang-format v11) (#455)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* cmake formatting (cmake-format) (#456)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: CID manager
* pcs: explicit flush with no delivered data executes retirement logic
* source formatting (clang-format v11) (#464)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: rocprofiler_query_pc_sampling_agent_configurations docs update
* source formatting (clang-format v11) (#465)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: rocprofiler_configure_pc_sampling_service docs update
* pcs: explicit sync introduced in PCSCIDManager
* pcs: new logic for retiring CIDs in PC sampling service documented
* pcs: queue interception cb signature updated
* source formatting (clang-format v11) (#471)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: if no agents supports PC sampling, fail gracefully
* elaborating when KFD returns EBUSY and EEXIST
* pcs: the second PC sampling examples fails gracefully
* code samples use only single kernel for now
* pcs: CID manager refactored
* source formatting (clang-format v11) (#481)
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
* pcs: ioctl update
* source formatting (clang-format v11) (#531)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs:code sample to test PC sampling applied on concurrent kernels
* source formatting (clang-format v11) (#533)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: pc sampling strest test included
* cmake formatting (cmake-format) (#539)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#540)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: standalone benchmark
* cmake formatting (cmake-format) (#555)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: glance in external correlation IDs
* source formatting (clang-format v11) (#557)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* another change in ioctl interface
* pcs: update queue interceptor callbacks and samples accroding to the agent 0 version
* source formatting (clang-format v11) (#611)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: avoid running problematic PC sampling test
* pcs: guarding tests not to fail on architectures not supporting PC sampling
* source formatting (clang-format v11) (#617)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: check IOCTL version prior to each KFD call
* pcs: ioctl refactoring
* pcs: PC sampling service increases the ref_count of the correlation ID of the kernel dispatch
* cmake formatting (cmake-format) (#631)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* source formatting (clang-format v11) (#632)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: PC sampling service provides external correlation IDs
* source formatting (clang-format v11) (#644)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: use rocprofiler_dim3_t for workgrou_ip
* source formatting (clang-format v11) (#645)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: minor fixes
* pcs: updating the documentation for the pc sampling API functions
* pcs: api table and queue controller fix
* pcs: don't generate marker packets for the agent if PC sampling is not configured on it
* pcs: multi-GPU and single-GPU clients
* source formatting (clang-format v11) (#700)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: warning and errors fixed
* source formatting (clang-format v11) (#702)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: clang compiler errors and warnings fixed
* source formatting (clang-format v11) (#716)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: const reference in cid manager
* source formatting (clang-format v11) (#717)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: const & func in manager explicit
* pcs: test to cover creating PC sampling service of agent that does not exist
* pcs: generate marker packets if service is active
* source formatting (clang-format v11) (#719)
Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>
* pcs: refactoring hsa_adapter; use the correlation_id->thread_idx
* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp
* Update source/lib/rocprofiler-sdk/pc_sampling/utils.cpp
* Update utils.cpp
* moving pc-sampling tests and samples to pc-sampling label
* Format fix
* pcs: use configured instead of active service
* Update source/lib/rocprofiler-sdk/pc_sampling/service.cpp
* pcs: ensure configuring PC sampling on the HSA level is called only once
* pcs: minor fix
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* pcs: refactoring IOCTL integration
* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: reverting back what bot doubled
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: retesting the bot
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: why bot fails on this IOCTL status
* pcs: why failing on <vector>
* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: returning commits removed by bot
* pcs: formatting locally
* pcs: clients are flushing buffers inside the tool_fini
* pcs: sync function in public API
* pcs: sync prior to unloading the code object
* pcs: sync function requires context
* pcs: client uses CID retirement service
* pcs: test for flusing internal ROCr buffers
* pcs: source formatting
* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pcs: code samples refactoring
* pcs: public API header refactored
* pcs: rocprofiler_buffer_flush drains internal PC sampling buffers too
* pcs: remove unnecessary functions
* pcs: do not call hsa's copytables
* pcs: include reordering
* pcs: using ROCP_ERROR inside PC sampling implementation
* pcs: pc_sampling sample uses ostream instean of printfs
* pcs: pc_sampling_codeobj tracing using ostream instead of prints
* pcs: registering once for interceptor callbacks
* pcs: do not generate internal CIDs if not in debug mode
* pcs: rebasing fixed; missing external correlation IDs
* pcs: code formatting
* enable kernel tracing service to receive external correlation IDs
* pcs: using ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL
* pcs: polishing parser
* formatting
* updating parser to use workgroup_id
* kfd_ioctl.h extracted in details folder
* refactoring
* pcs: preparing to generate code object information
* flush internal buffers prior to unloading code object
* pcs: generating marker records
* pcs: wrap code_object's shutdown function
* ROCR_VISIBLE_DEVICES and HIP_VISISBLE_DEVICES unsupported at the moment
* documenting the ignorance of ROCR/HIP_VISIBLE_DEVICES
* pcs: separate structs for code object loading/unloading markers
* pcs: inst_pkt_t changed the namespace
* pcs: removing wrapper around the shutdown function
* pcs: size in record field
* pcs: documentation refactoring + typdefs
* renaming PCSAgentConfig to PCSAgentSession
* pcs: service does not keep a pointer to the context
* pcs: static assertions related to the versioning
* pcs: rocprofiler_pc_sampling_configuration_t size field
* pcs: report API unimplemented unleass explicitly enabled
* pcs: skip tests if KFD does not support PC sampling
* pcs: if ROCr hides some devices, no PC samples will be delivered for it
* pcs: hip error check after kernel launch
* formatting
* removing PCS info from agent.h
* fix based on review
* Update continuous integration workflow
- use mi200 runner for code coverage (supports PC sampling)
- split sanitizer jobs across navi3, vega20, and mi300
* Updating pc sampling test labels
* ROCP_PC_SAMPLING_ENABLED env in CI
* ROCP_PC_SAMPLING_ENABLED for all CI mi200 jobs
* Rearrange sanitizer assignments
* fixes according to review
* removed unused functions
* pcs: rocprofiler_agent_id_t instead of handle as a key in map
* Update source/lib/rocprofiler-sdk/context/context.hpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* removing drm_fd from the agent.h
* pcs: removing one sample due to complexity
* pcs: refactoring sample
* simplifying sample
* new lines
* Improve queue_control enable intercepter logic
* Update lib/rocprofiler-sdk/hsa/types.hpp
- handle amd_ext size for HSA 1.12.0
* ROCP_PC_SAMPLING_ENABLED -> ROCPROFILER_PC_SAMPLING_BETA_ENABLED
* Update hsa_adapter.cpp
- anonymous namespace + remove debug
* parser update
* Apply suggestions from code review
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
Co-authored-by: vlaindic <vladimir.indic@amd.com>
Co-authored-by: vlaindic <vlaindic@amd.com>
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Stabilizing the machines used for testing
* Update .github/workflows/continuous_integration.yml
* Update continuous_integration.yml
* Delete .github/workflows/ci_pc_sampling.yml
* Update continuous_integration.yml
- add mi300
- use if conditions for whether to run "extended" core tests
* Consistency in matrix for each job
* Update continuous_integration.yml
- include runner in core CDash name
* Update continuous_integration.yml
- remove mi300
* Update continuous_integration.yml
- add mi300
* Update CI workflow
- tweak "Install requirements" step
* Update CI workflow
- timeout on install requirements
* Update CI workflow
- revert sanitizers to gcc-12
* Update CI workflow
- remove core installation of clang-tidy (handled by python pip)
* Update CI workflow
- disable fast-fail
* Update CI workflow
- add runner to all build names
- remove mi200 and mi300
* Update CI workflow
- code coverage runs on navi3
* Update CI workflow
- add runner to all build names
* Tweak to CI sanitizer jobs
* Update CI workflow
- Handle excluded tests
* Update CI workflow
- Handle excluded tests (part 2)
* Update CI workflow
- Handle excluded tests (fix quotations)
---------
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Fix drm include for OpenSUSE
- uses libdrm/drm.h instead of drm/drm.h
* Fix "List Files" step in CI workflows
* Fix "List Files" step in CI workflows
* Enable INFO logging on retried CI jobs
* Update lib/rocprofiler-sdk/async_copy.cpp
- rework active_signals
- make hsa_signal_t member variable
- remove sync from destructor
- replace _is_set with atomic counter
- timeout of 30 seconds hsa_signal_wait
- switch from relaxed to scacquire/screlease memory ordering
- improve logging and error handling
- destroy hsa signal in active_signals in async_fini
* Update lib/rocprofiler-sdk/async_copy.cpp
- active_signals::create
- change initial value of signal to 1 instead of value of completion signal
- change condition trigger of signal callback
* Update tests/counter-collection/validate.py
* Update lib/rocprofiler-sdk/async_copy.cpp
- improved logging
- fix hsa_signal_wait_scacquire_fn check
* Cleanup tests/lib/transpose/transpose.cpp
- remove huge comment block
* Appears to be working on MI200
Dependency Versions:
clr: f7b1398361 - compile mode: release
hsa-runtime: 4cd6c62f25dbbdbaa8580dd4ad8f388c98c508da - compile mode: RelWithDebug
* Update source/lib/rocprofiler-sdk/hsa/async_copy.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Format fix
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <aelwazir@hpe6u-21.amd.com>
* adding pandas and pytest to rquirements.txt
* setting up requrements.txt
* Update requirements
- formatting packages
- remove packages not directly used by rocprofiler-sdk
* Update cmake formatting, linting, and options
- if BUILD_CI -> force BUILD_DEVELOPER and BUILD_WERROR
- support python installed clang-format and python installed clang-tidy
* Update build.sh
- split into install-deps.sh and install-apt-deps.sh
* Improve code coverage
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Added first ATT API
* Finalizing thread trace API
* Fixing more rebase conflicts
* Added codeobj disassembly sample
* Fixing merge issues with rebase [2]
* Adding ATT packets
* Implemented thread trace intercept
* Moved codeobj parser to same repo as rocprofiler
* Moved thread trace to new API
* Fixing merge conflicts
* Fixing more merge conflicts
* Adding thread trace packet reuse
* Merged aql_profile_v2 headers
* Linked ATT sample to aqlprofile
* Updated decoder to include non-loaded codeobjs
* Implemented ISA decoder into ATT sample
* Added marker_id to vaddr
* Updating aql_profile_v2 API to memcpy
* Updating thread trace API to include 64bit markers. Using the result of ISA matching.
* Added instruction type and cycles summary
* Updated sample with selection of kernel by kernel_object
* Added option to copy from memory kernels
* Moved tool_data in thread_trace to dynamic alloc
* Restoring hsa.cpp
* Fixed ATT sample crash. General improvements.
* Moved codeobj library to outside src/
* Updated license header
* Moved codeobj_capture to camelcase
* Solving some more merge conflicts
* Update samples/advanced_thread_trace/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update samples/advanced_thread_trace/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update samples/code_object_isa_decode/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update source/lib/rocprofiler-sdk/thread_trace/CMakeLists.txt
* Removing unused parameter check
* Adding const to isEmpty
* Removing unused warning
* Adding libdw-dev to requirements
* Running clang-format
* Commenting out new aql calls
* Clang format
* Unused variable fix
* Adding codeobj-decoder coverage
* Commenting out threadtrace
* Update samples/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* P
* WOverloaded
* Addressing clang-tidy
* Virtual destructor on ttracer class
* Corr id
* Fixing code source format
* Update CMakeLists.txt
* Build fixes
* Update source/lib/rocprofiler-sdk-codeobj/code_object_track.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix shadowing
* Update CMakeLists.txt
* Update samples/CMakeLists.txt
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Handle hsa_queue_destroy after finalization
- fixes issue where hsa_queue_destroy(...) is invoked after rocprofiler-sdk has finalized
- hsa::get_queue_controller() returns pointer
- if queue controller is a null pointer, skip invoking QueueController::destroy_queue
* Update HIP/HSA/marker update_table logging
* Update rocprofv3 tests
- remove HSA_TOOLS_LIB env variable
- remove setting ROCPROFILER_LOG_LEVEL env variable
- add timeouts to tests which are missing them
* Disable thread sanitizer deadlock detection
* Update CI workflow
- rename vega20-ubuntu job to core-ci
- enable navi32 in core-ci and sanitizers
* Update run-ci.py
- set gcovr html medium and high threshold
* Update lib/rocprofiler-sdk/hsa/queue_controller.cpp
- remove this capture from enable/disable serialization
* Update lib/rocprofiler-sdk/hsa/{hsa_barrier,profile_serializer}.*
- hsa_barrier::set_barrier accepts const-ref to queue map
- profile_serializer::enable and profile_serializer::disable accept const-ref to queue map
* Logging for HIP/HSA/marker/profile_serializer
* Logging for HIP/HSA/marker/queue_controller
* Improve test_retired_correlation_ids asserts
* Fix tests/counter-collection/validate.py
- scale expected SQ_WAVES counter value based on warp size of GPU
* Tweak github comment for code coverage
* Remove gcovr html high/medium threshold args
* Fix tests/counter-collection/validate.py
- round before casting to int in test_counter_values
* operator bool for profile_serializer
- only wait on CV if profile_serializer is used
* Logging updates (profile_serializer + code_object)
* Update counter-collection validate.py
* QueueController does not wait on CV if finalizing/finalized
* Update CI workflow
- remove navi32 from core job
* Improve HIP/HSA/marker tracing get_functor/functor
- remove lambda wrapper around functor
* Update lib/rocprofiler-sdk/hsa/queue_controller.cpp
- do not acquire cvmutex lock during finalization
* Update lib/rocprofiler-sdk/hsa/hsa_barrier.*
- move ctor and dtor to implementation
- skip signal store screlease and destroy if already finalized
* Update CI workflow
- remove navi32 runners
* bwelton fixes for hangs
* CMake improvements + simplified demangle
- remove amd-comgr from common target (and thus removed from roctx DT_NEEDED)
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>