d73178f77af25d3c3116a8030fd2a0fac6da4a56
46 Cometimentos
| Autor(a) | SHA1 | Mensagem | Data | |
|---|---|---|---|---|
|
|
3547a45c0c |
Improve buffer flush error handling (#416)
* Update include/rocprofiler-sdk/fwd.h - add ROCPROFILER_STATUS_ERROR_FINALIZED error code * Update lib/rocprofiler-sdk/rocprofiler.cpp - status string for ROCPROFILER_STATUS_ERROR_FINALIZED * Update lib/rocprofiler-sdk/buffer.cpp - return error code if buffer flush invoked after finalized - fatal error if task group destroyed - error message if task runs after finalized - improve join of task group * Update lib/rocprofiler-sdk/counters/tests/evaluate_ast_tests.cpp - Update lambdas to return reference due to strange -Warray-bounds and -Wstringop-overflow warnings with g++ (Ubuntu 13.1.0-8ubuntu1~20.04.2) 13.1.0 |
||
|
|
9a8b6f6b7b |
Counter API and Samples Updates (#410)
* Update include/rocprofiler-sdk/{counters,profile_config}.h
- use rocprofiler_agent_id_t instead of rocprofiler_agent_t
* Update samples
- use rocprofiler-sdk::rocprofiler-sdk instead of rocprofiler::rocprofiler in cmake
- api_callback_tracing sample roctxProfiler{Pause,Resume}
- api_callback_tracing sample uses ROCTx
- updates to use rocprofiler_agent_id_t
* Update run-ci.py
- exclude rocprofiler-sdk-tool from samples (no sample uses that code)
* Update lib/rocprofiler-sdk-tool/tool.cpp
- Update rocprofiler_iterate_agent_supported_counters to use agent ID
* Update lib/rocprofiler-sdk/counters/core.*
- profile_config has pointer to agent instead of copy
* Update lib/rocprofiler-sdk/agent.*
- provide get_agent(...) func via rocp agent id
* Update lib/rocprofiler-sdk/{buffer,callback}_tracing.cpp
- return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED for enums missing implementation
* Update lib/rocprofiler-sdk/counters.cpp
- update to use rocprofiler_agent_id_t instead of rocprofiler_agent_t
* Update lib/rocprofiler-sdk/profile_config.cpp
- update to use rocprofiler_agent_id_t instead of rocprofiler_agent_t
* Update source/docs
- requirements.txt + install reqs in cmake
* Bump version to 0.1.0
* Update samples/api_callback_tracing/CMakeLists.txt
- LD_LIBRARY_PATH for test
* Update test/rocprofv3/tracing/CMakeLists.txt
- reorder validation files so memory copy comes first
* Update lib/rocprofiler-sdk-tool/tool.cpp
- logging for flushing buffers
- variables for buffer_size and buffer_watermark
- increase the watermark to a full buffer
- use dedicated threads for each buffer
* Update lib/rocprofiler-sdk-tool/CMakeLists.txt
- test sets ROCPROF_LOG_LEVEL and ROCPROFILER_LOG_LEVEL to info
* Remove lib/rocprofiler-sdk-tool/trace_buffer.hpp
* Update lib/rocprofiler-sdk-tool/CMakeLists.txt
- drop log level to warning when leak sanitizer is enabled (produces small memory leak)
|
||
|
|
c641749fe6 |
HIP API Tracing (#357)
* Update include/rocprofiler-sdk/hip*
- updates for intercept table
* Update lib/common/units.hpp
- clang-tidy fixes
* Add lib/rocprofiler-sdk/hip
- tracing implementation for the HIP intercept table
* Update source/lib/rocprofiler-sdk/CMakeLists.txt
- add_subdirectory(hip)
* Update source/lib/rocprofiler-sdk/hsa
- offset function in hsa_api_info<Idx>
- remove report_activity, set_callback
- Tweak HSA_API_TABLE_LOOKUP_DEFINITION
* Update lib/rocprofiler-sdk/hip
- rocprofiler::hip::copy_table
- stringize_impl print dereferenced pointers when possible
* Update lib/rocprofiler-sdk/hsa/utils.hpp
- stringize_impl print dereferenced pointers when possible
* Update lib/rocprofiler-sdk/tests/intercept_table.cpp
- remove failures for intercepting HIP API tables
* Update include/rocprofiler-sdk/fwd.h
- add ROCPROFILER_HIP_RUNTIME_LIBRARY (== ROCPROFILER_HIP_LIBRARY)
- add ROCPROFILER_HIP_COMPILER_LIBRARY
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_query_buffer_tracing_kind_operation_name
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_iterate_buffer_tracing_kind_operations
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_query_callback_tracing_kind_operation_name
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operations
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operation_args
* Update lib/rocprofiler-sdk/intercept_table.cpp
- support HipDispatchTable and HipCompilerDispatchTable
* Update lib/rocprofiler-sdk/internal_threading.cpp
- Support ROCPROFILER_HIP_COMPILER_LIBRARY
* Update lib/rocprofiler-sdk/registration.cpp
- Support "hip" and "hip_compiler" in rocprofiler_set_api_table
- Added some extra logging
* Update samples/api_{buffered,callback}_tracing
- Modifications to demonstrate HIP API tracing
* Update tests/kernel-tracing
- Modifications to handle/test HIP API tracing
* Separate HIP tracing from HIP compiler tracing
* Fix installation of include/rocprofiler-sdk/hip/*
- add compiler and table headers to install
* Fixes to HIP interception
- hip_api_trace.hpp was updated a bit
- removed hipGetDeviceProperties (generic)
- added hipGetDevicePropertiesR0600
- added hipGetDevicePropertiesR0000
- removed hipRegisterTracerCallback
- reordered hipCreateChannelDesc, hipExtModuleLaunchKernel, hipHccModuleLaunchKernel
- added hipDrvGraphAddMemsetNode
- static asserts in hsa_api_info ensuring ordering of pointers
* Update lib/rocprofiler-sdk/hip/hip.*
- use size_t instead of rocprofiler_hip_table_api_id_t as non-type template parameter (smaller binary)
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)
* Update lib/rocprofiler-sdk/hsa/hsa.*
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)
* Update test/kernel-tracing/validate.py
- does not expect any hip_api_traces until libamdhip.so actually starts using rocprofiler-register
* Update tests/tools/json-tool.cpp
- fix context associated with "HIP_API_CALLBACK"
* Update external/CMakeLists.txt
- move misc variables to top of CMakeLists.txt so they apply to all external subprojects
- BUILD_TESTING (OFF)
- BUILD_SHARED_LIBS (OFF)
- BUILD_OBJECT_LIBS (OFF)
- BUILD_STATIC_LIBS (ON)
- CMAKE_POSITION_INDEPENDENT_CODE (ON)
- CMAKE_VISIBILITY_INLINES_HIDDEN (ON)
- CMAKE_CXX_VISIBILITY_PRESET (hidden)
- disable using libunwind in glog
* Update lib/rocprofiler-{sdk,sdk-tool}/CMakeLists.txt
- remove explicit setting of SKIP_BUILD_RPATH
* Update CMakeLists.txt
- set high-level CMAKE_BUILD_RPATH and CMAKE_INSTALL_RPATH_USE_LINK_PATH
* Update tests/CMakeLists.txt
- include(GNUInstallDirs)
* Update samples/CMakeLists.txt
- include(GNUInstallDirs)
* Update include/rocprofiler-sdk/hip/{compiler_api,api}_args.h
- remove extern "C" due to incompatibility b/t empty struct in C (size 0) vs. empty struct in C++ (size 1)
* Update lib/rocprofiler-sdk/hip/details/ostream.hpp
- clang-tidy fixes
* Update cmake/rocprofiler_linting.cmake
- add a feature for clang tidy exe
* Update lib/rocprofiler-sdk/hip/hip.cpp
- use recursion instead of fold expression due to clang-tidy errors (maximum nesting level exceeded)
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- fix merge
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- fix merge
* Update bin/rocprofv3
- args for marker, HIP runtime, and HIP compiler tracing
* Update tests/apps/simple-transpose
- use roctx
* Update tests/rocprofv3/tracing
- validate marker API data
* Update lib/rocprofiler-sdk-tool
- support for HIP runtime, HIP compiler, marker API
* Update queue/queue_controller/registration/utility
- call hsa::queue_controller_fini() during finalization
- add a yield function to common/utility.hpp
- implements a thread yield + sleep
- add a sync function to Queue class
- add a iterate_queues member function to QueueController
- this is used to sync each queue during queue_controller_fini()
* Fix data races: queue/context/stable_vector
- stable_vector::emplace_back returns reference
- correlation id map uses stable_vector
- queue_info_session has explicit fields for queue id, hsa agent, rocp agent
- use hsa::get_table() in AsyncSignalHandler
- WriteInterceptor does not use TLS for context array
* Update lib/rocprofiler-sdk/hsa/hsa.*
- static object for API subtables
- accessors for API subtables
- google tests for HSA API subtables
* Update lib/rocprofiler-sdk/hsa/{queue,async_copy}.cpp
- use HSA subtable accessors
* Update rocprofiler_memcheck and CI workflow
- use GCC 13 instead of GCC 11 due to suspected false positives in thread sanitizer
- GCC 13 uses libtsan.so.2
* Update CI workflow
* Update lib/rocprofiler-sdk/counters/{metrics,counters}
- fix possibly dangling reference to a temporary from gcc-13
* Update thread-sanitizer-suppr.txt
- Ignore data races originating in hsa-runtime library
* Update cmake/rocprofiler_memcheck.cmake
- Deduce the sanitizer library to preload by compiling an application and extracting the linked sanitizer library
* Update tests/rocprofv3/tracing/CMakeLists.txt
- add csv files to REQUIRED_FILES and ATTACH_ON_FAIL in validate test
* Update lib/common/container/record_header_buffer.hpp
- fix data race identified by gcc v13 and libtsan.so.2
* Update hip API id, args, and def
- remove hipDrvGraphAddMemsetNode (not part of ROCm 6.0
* Update lib/common/container/record_header_buffer.hpp
- fix deadlock in save/read/reset
* Update source/docs/CMakeLists.txt
- remove COMMAND_ERROR_IS_FATAL ANY to allow for printing of stdout/stderr
* Update lib/rocprofiler-sdk/hip/details/ostream.hpp
- remove overloads for HIP_MEMSET_NODE_PARAMS
* Update docs/CMakeLists.txt
- use find_program for shell instead of hardcoded /bin/bash
|
||
|
|
1f4cf1aa39 |
Tools update (#397)
* Srnagara/tool counters collect (#331) * Adding counter collection capability to tools * Adding counter collection feature to tools * Adding counter collection capability to tools * Fixing merge down issues * Small tool fixes for build + prevent profile realloc * Reproducing the counter name query issue in buffered callback * Minor fix for init order + sample that directly uses sdk-tool for debug purposes * Adding a temporary fix to print the counter names * Fixing the output file name and reverting the changes of caching the profile config * Fixing SGPR_Count value * cleaning up debug prints * Adding header to counter collection file * Adding kernel filtering support * Remove threading * Cleaning up the code * Removing redundant prints * Revert "Remove threading" This reverts commit 05c58fb9de826e92cf8d2e3d1c31d5578525dcb4. * Revert "Cleaning up the code" This reverts commit 1d964882bf2396dee8ad020cbb6c83b36e0674e9. * Changing the tools code to align with init-order fix * cmake formatting (cmake-format) (#335) Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> * source formatting (clang-format v11) (#336) Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> * Adding support for async memory copy * source formatting (clang-format v11) (#391) Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> * Fixing header typo * Fixing tool_fini * Replaceing the direction and kind fields values with description * Update lib/rocprofiler-sdk-tool/helper.cpp - Remove use of VLA * Update lib/rocprofiler-sdk-tool/tool.cpp - Formatting * Migrate common/config.* to rocprofiler-sdk-tool * Update lib/rocprofiler-sdk-tool/tool.cpp - fix clang-tidy issues * source formatting (clang-format v11) (#392) Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com> * Update lib/common/mpl.hpp - is_string_type / is_string_type_impl for deducing if type is a string type * Update include/rocprofiler-sdk/fwd.h - ROCPROFILER_BUFFER_TRACING_MEMORY_COPY_NONE starts at zero * Update lib/rocprofiler-sdk/hsa/async_copy.* - functions for operation ids and names * Update lib/rocprofiler-sdk/buffer_tracing.cpp - support iterating and getting names for ROCPROFILER_BUFFER_TRACING_MEMORY_COPY * Update lib/rocprofiler-sdk-tool/config.* - env ROCPROFILER_ prefix -> ROCPROF_ prefix - add support for memory copy tracing, counter collection, etc. * Update lib/rocprofiler-sdk-tool/helper.* - removed TracerFlushRecord - removed cxa_demangle (use one in common library) - removed GetCounterNames (handled in config) - removed GetKernelNames (handled in config) * Add lib/rocprofiler-sdk-tool/output_file.* - separate out get_output_stream function and output_file struct from tool.cpp * Add lib/rocprofiler-sdk-tool/csv.hpp - write_csv_entry automatically quotes strings - csv_encoder struct enforces correct number of columns * Update lib/rocprofiler-sdk-tool/CMakeLists.txt - add new files * Update lib/rocprofiler-sdk-tool/tool.cpp - update construction of output_file class - add kernel_symbol_data for serializing kernel trace data - use config instead of env lookups - optimize counter collection profile config lookup/creation * Update bin/rocprofv3 - rocprofv3 --help exits with 0 (as it should) - command-line arg for memory copy tracing - command-line arg for mangled kernels - command-line arg for truncated kernels - env ROCPROFILER_ prefix -> env ROCPROF_ prefix * Update tests/async-copy-tracing/validate.py - update test_async_copy_direction to new enum values * Update tests/kernel-tracing/validate.py - update test_async_copy_direction to new enum values * Update tests/tools/json-tool.cpp - add ROCPROFILER_BUFFER_TRACING_MEMORY_COPY to supported buffer_name_info * Update samples/counter_collection/{CMakeLists.txt,main.cpp} - remove counter-collection-sdk-tool * Update .github/workflows/docs.yml - fix paths triggering running the workflow --------- Co-authored-by: Benjamin Welton <bewelton@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com> * adding counter collection support * Adding counter collection test * changing directory structure of counter collection tests * Fixing test path for rocprofv3 * Adding hsa-tracing basic test * cmake formatting (cmake-format) (#362) Co-authored-by: bgopesh <bgopesh@users.noreply.github.com> * counter collection tests drop2 * fixing hsa-trace test for rocprofv3 path * python formatting (black) (#371) Co-authored-by: bgopesh <bgopesh@users.noreply.github.com> * both counter colleciton and tracing should work together * Fixing rocprofv3 path * Attempt to fix Segfault with AddressSanitizer * fixing sanitizer segfault * Update rocprofv3 * Update lib/rocprofiler-sdk-tool/README.md - update env variables * Update lib/rocprofiler-sdk/buffer_tracing.cpp - return ROCPROFILER_STATUS_BUFFER_NOT_FOUND if buffer tracing service is configured with invalid buffer * Update lib/rocprofiler-sdk-tool/tool.cpp - designated hsa API trace buffer * Update tests/hsa-tracing/CMakeLists.txt - Fix environment * Update rocprofv3 - do not override HSA_TOOLS_LIB - support ROCPROF_PRELOAD - LD_PRELOAD librocprofiler-sdk.so * Restructure tests directory - move all rocprofv3 integration tests into subfolder * Update cmake/Templates/rocprofiler-sdk/config.cmake.in - create rocprofiler-sdk::rocprofv3 cmake target * Update tests/rocprofv3/hsa-tracing - improve validate.py - convert input to dict via csv.DictReader * Update tests/apps/CMakeLists.txt - fix build rpath for simple-transpose * Update cmake/rocprofiler_memcheck.cmake - prefer libtsan.so.0 * Update tests/rocprofv3/hsa-tracing - move to tests/rocprofv3/tracing - include kernel tracing and memory copy tracing * Update lib/rocprofiler-sdk-tool/tool.cpp - normalize "_ID" vs. "_Id" in CSV column names (use "_Id") * Update lib/rocprofiler-sdk/buffer.{hpp,cpp} - change signature of buffer::get_buffers() - buffer::get_buffers() uses static_object * Update lib/rocprofiler-sdk/context/context.cpp - update usage of buffer::get_buffers() - now returns pointer * Update lib/rocprofiler-sdk/tests/buffer.cpp - update to change for signature of buffer::get_buffers() * Update tests/rocprofv3/tracing/CMakeLists.txt - use %argt% with -d argument * Update lib/rocprofiler-sdk-tool/tool.cpp - use atexit for finalization * Update tests/rocprofv3/tracing/CMakeLists.txt - tweaked name of tests * Update lib/rocprofiler-sdk/hsa/async_copy.* - async_copy_fini + reference counting signals * Update lib/rocprofiler-sdk/registration.cpp - invoke hsa::async_copy_fini() to prevent data race on signals --------- Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com> Co-authored-by: Benjamin Welton <bewelton@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com> Co-authored-by: bgopesh <bgopesh@users.noreply.github.com> |
||
|
|
21dd088c8e |
ROCTx Library Tracing (#390)
* Update include/rocprofiler-sdk/marker/*
- Update rocprofiler_marker_api_args_t for all API functions
- Add ROCPROFILER_MARKER_API_ID_roctxGetThreadId to rocprofiler_marker_api_id_t
* Update include/rocprofiler-sdk/marker/api_args.h
- fix include
* Update lib/common/mpl.hpp
- is_pair
- is_type_complete_v
* Update include/rocprofiler-sdk/marker/*
- fix rocprofiler_marker_api_retval_t
- add roctxGetThreadId to rocprofiler_marker_api_args_t
- fix type in enum: HsaDevice -> HsaAgent
- add table_api_id.h
* Update include/rocprofiler-sdk/marker.h
- include marker/table_api_id.h
* Update include/rocprofiler-sdk/buffer_tracing.h
- Buffer marker tracer records have begin and end timestamp
* Add lib/rocprofiler-sdk/marker
- tracing implementation for marker (roctx) library
* Update include/rocprofiler-sdk/{buffer_tracing,marker/table_api_id}.h
- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- support for ROCPROFILER_BUFFER_TRACING_MARKER_API
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- support for ROCPROFILER_CALLBACK_TRACING_MARKER_API
* Update lib/rocprofiler-sdk/intercept_table.cpp
- template instantiation for notify_runtime_api_registration
* Update lib/rocprofiler-sdk/registration.cpp
- enable roctx in rocprofiler_set_api_table
* Update lib/rocprofiler-sdk/marker/marker.cpp
- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t
* Update lib/rocprofiler/tests for roctx testing
- add roctx.cpp
- unit tests for roctx callback and buffer tracing
- support marker API in get_{buffer,callback}_tracing_names()
* Update lib/common/logging.cpp
- logging initialized message mentions env variable
* Update lib/common/mpl.hpp
- NOLINT for misc-definitions-in-headers
* Update lib/rocprofiler-sdk/tests/CMakeLists.txt
- include LD_LIBRARY_PATH in rocprofiler-lib-tests-shared tests
* Update lib/rocprofiler-sdk/registration.cpp
- client_library_vec_t is now vector of option<client_library>
- enables resetting the client_library after finalization
- removed acquiring registration lock when invoke_client_finalizers called via atexit
- this was causing some lock-order-inversion warnings (potential deadlock)
* Update lib/rocprofiler-sdk/agent.cpp
- model name for agent supports spaces
* Update tests/common/serialization.hpp
- add serialization support for marker tracing data structures
* Update tests/apps
- Add ROCTx markers into reproducible-runtime and transpose
* Update tests/tools/json-tools.cpp
- add marker tracing support
- remove strdup (no longer necessary)
* Update tests/kernel-tracing/validate.py
- validate marker API tracing data
* Update tests/async-copy-tracing/validate.py
- validate marker API tracing data
* Update cmake for load path resolution during testing
* Update tests/async-copy-tracing/CMakeLists.txt
- fix test LD_LIBRARY_PATH
* Update cmake/Templates/rocprofiler-sdk-roctx/config.cmake.in
- fix constructing rocprofiler-sdk-roctx::rocprofiler-sdk-roctx
|
||
|
|
1edd4891b2 |
ROCTx Library (#360)
* Initial implementation of roctx library
* Update include/roctx/CMakeLists.txt
- fix installation
* Update cmake/rocprofiler_config_packaging.cmake
- add rocprofiler-sdk-roctx installer
* Update include/roctx/CMakeLists.txt
- include api_trace.h in installation
* Update include/roctx/api_trace.h
- add ROCTX_API_TABLE_VERSION_MAJOR define
- add ROCTX_API_TABLE_VERSION_STEP define
* Update lib/roctx/roctx.cpp
- static asserts for table size and struct member offsets
* Update external/CMakeLists.txt
- move BUILD_SHARED_LIBS to top
- disable libunwind for glog
* Update lib/roctx/CMakeLists.txt
- Update {BUILD,INSTALL}_RPATH
* Relocate include/roctx to include/rocprofiler-sdk/roctx
* Relocate lib/roctx to lib/rocprofiler-sdk-roctx
- change the name of the library from libroctx to librocprofiler-sdk-roctx
* Move lib/plugins to lib/rocprofiler-sdk-tool/plugins
- also change install export group
* Update lib/rocprofiler-sdk/CMakeLists.txt
- change rocprofiler-shared-library EXPORT group (rocprofiler-sdk-library-targets -> rocprofiler-sdk-targets)
* Update cmake/rocprofiler_utilities.cmake
- change install EXPORT group
- rocprofiler-sdk-library-targets -> rocprofiler-sdk-targets
* Update CMakeLists.txt
- set PACKAGE_NAME at high level
- include(rocprofiler_config_install_roctx)
* Update cmake/rocprofiler_config_install* and cmake/Templates/*.cmake.in
- added rocprofiler_config_install_roctx.cmake for installing roctx as a package
- reorganization of existing cmake/Templates/*-config.cmake.in files
- created new config.cmake.in and build-config.cmake.in for rocprofiler-sdk-roctx
* Relocate include/rocprofiler-sdk/roctx to include/rocprofiler-sdk-roctx
* Update rocprofiler_config_install_roctx.cmake
* Update lib/rocprofiler-sdk-roctx/roctx.cpp
- update include paths
* Update lib/rocprofiler-sdk-roctx/CMakeLists.txt
- change target name to have rocprofiler-sdk- prefix
- interface target_include_directories
- define export symbol
* source formatting (clang-format v11) (#361)
Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>
* Update include/rocprofiler-sdk/fwd.h
- fix doxygen markup for ROCPROFILER_STATUS_ERROR_CONTEXT_ERROR
* Update modulefile and setup-env.sh
* Update cmake/Templates/rocprofiler-sdk/config.cmake.in
- fix inclusion of rocprofiler-sdk-targets.cmake
* Update include/rocprofiler-sdk-roctx
- add types.h for typedefs
- add doxygen comments for roctx.h
- add roctxGetThreadId function
- roctxProfilerStart and roctxProfilerStop accept thread ID param
* Update lib/rocprofiler-sdk-roctx/roctx.cpp
- hsa_agent_t* -> hsa_agent_s*
* Update lib/rocprofiler-sdk-roctx/roctx.cpp
- support for roctxGetThreadId
- update signatures of roctxProfilerPause and roctxProfilerResume
* Update lib/rocprofiler-sdk-roctx/roctx.cpp
- Initialize logging with ROCTX_LOG_LEVEL
* Update include/rocprofiler-sdk-roctx/roctx.h
- remove ROCTX_NONNULL for ihipStream_t parameter in roctxNameHipStream because default stream is a nullptr
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
||
|
|
dc8b8aa448 |
Cleanup + logging env variable (#387)
* [CP] Update tests/common/serialization.hpp
- remove duplication in rocprofiler_callback_tracing_code_object_load_data_t
* [CP] Update lib/rocprofiler-sdk/tests
- create common.hpp
- update registration.cpp to use common.hpp
* [CP] Add lib/common/logging.{hpp,cpp}
- generic init_logging function
* [CP] Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- remove excess logging
* [CP] Update lib/rocprofiler-sdk/registration.cpp
- use common::init_logging(...)
- enforce ROCPROFILER_REGISTER_FORCE_LOAD in rocprofiler_force_configure
- logging updates in rocprofiler_set_api_table
* Update include/rocprofiler-sdk/buffer_tracing.h
- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t
* Update lib/common/utility.hpp
- remove active_capacity_gate
* Update lib/rocprofiler-sdk/tests/common.hpp
- fix get_{callback,buffer}_tracing_names()
* Update lib/rocprofiler-sdk/counters/xml/{basic,derived}_counters.xml
- add entries for gfx1102
|
||
|
|
0952308c4a |
Add check to ensure metrics are valid on GPU Arch (#384)
* Add check to ensure metrics are valid on GPU Arch Ensure requested metrics are valid on the GPU arch. If not valid, error is returned during profile config init. * source formatting (clang-format v11) (#385) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Update metrics.cpp * source formatting (clang-format v11) (#386) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Update metrics.cpp --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bwelton <bwelton@users.noreply.github.com> |
||
|
|
936816f762 |
Async memory copy tracing (#317)
* Update samples/api_buffered_tracing/client.cpp
- support ROCPROFILER_BUFFER_TRACING_MEMORY_COPY
* Update include/rocprofiler-sdk/{buffer_tracing,fwd}.h
- update rocprofiler_buffer_tracing_memory_copy_record_t
- add ROCPROFILER_BUFFER_TRACING_MEMORY_COPY_HOST_TO_HOST to rocprofiler_memory_copy_operation_t
* Update lib/rocprofiler-sdk/context/context.*
- get_registered_contexts functions (local copy)
* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp
- include some memory allocations and memory copies for better testing
* Update tests/common/serialization.hpp
- update serialization save function for rocprofiler_buffer_tracing_memory_copy_record_t
* Update lib/rocprofiler-sdk/hsa/hsa.*
- remove stale set_callback / activity_functor_t code
- forward decl hsa_api_meta
- template struct hsa_api_func for getting function return type and args
* Update tests/kernel-tracing/validate.py
- enforce memory_copies data size
- test timestamps in memory copies data
- improve internal and external correlation id validation
* Update lib/rocprofiler-sdk/hsa/defines.hpp
- HSA_API_META_DEFINITION macro
* Update lib/rocprofiler/hsa/rocprofiler-sdk/hsa/hsa.def.cpp
- HSA_API_META_DEFINITION specializations for async copy functions
* Add lib/rocprofiler-sdk/hsa/async_copy.{hpp,cpp}
- implements buffer memory tracing
* Update lib/rocprofiler-sdk/registration.cpp
- invoke rocprofiler::hsa::async_copy_init
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- logging improvements
- improve hsa <-> rocp agent mapping
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- load original signal in async signal handler before store_screlease
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- use store_relaxed instead of store_screlease
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- logging
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- logging
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- misc changes
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- misc changes
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- misc changes
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- return function pointer instead of lambda
* Update reproducible-runtime.cpp
- device sync
* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp
- use *Async variants of hipMalloc and hipMemcpy
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- populate async data properly
* Update tests/kernel-tracing/validate.py
- verification of async copy direction
* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp
- temporarily disable async memcpy functions
* Create tests/tools
- directory containing tool libraries used for collecting data in integration tests
* Update tests/kernel-tracing
- remove kernel-tracing-test-tool library (now rocprofiler-sdk-json-tool)
- update cmake, validate.py, conftest.py accordingly
* Add tests/async-copy-tracing
- integration test validating async copy tracing in transpose example
* Update tests/CMakeLists.txt
- updates for restructuring
* Revert tests/apps/reproducible-runtime
- restore code to semi-original state (no memory copying)
* Update tests/async-copy-tracing/validate.py
- fix comment in test_async_copy_direction
* Fix building tests against installation
|
||
|
|
6b374b8e68 |
Improve static singleton memory safety (#316)
* Update GitHub links * Update samples/api_buffered_tracing/client.cpp - check if initialized before forcing initialization * Add lib/common/static_object.* - template class for creating a static allocation in the binary which has all the properties of a heap allocated singleton but does not trigger leak sanitizers * Update include/rocprofiler-sdk/internal_threading.h - document return values * Update lib/rocprofiler-sdk/internal_threading.cpp - return codes from rocprofiler_create_callback_thread and rocprofiler_assign_callback_thread - use common::static_object for thread-pool object * Update lib/rocprofiler-sdk/agent.cpp - use common::static_object to store array of strings and their hashes * Update lib/rocprofiler-sdk/hsa/code_object.cpp - use common::static_object to store array of strings and their hashes to ensure strings exist until termination * Update lib/rocprofiler-sdk/registration.cpp - use common::static_object to store status and client libraries - update return values for rocprofiler_set_api_table * Update lib/rocprofiler-sdk/hsa/hsa.cpp - check registration::get_fini_status() in hsa_api_impl::functor<Idx>(args...) * Update lib/rocprofiler-sdk/context/context.cpp - using common::static_object for correlation id map |
||
|
|
8ed68ce4f3 |
Update packaging (#306)
* Restructured tests - support standalone compilation - move tests/kernel-tracing/serialization.hpp to tests/common/serialization.hpp - created tests/common library - handle cloning of cereal library in standalone build * Update install and packaging * Update cmake/rocprofiler_config_packaging.cmake - condense core, samples, development, and tools install components into single rocprofiler-sdk package - keep tests install component in separate rocprofiler-sdk-tests package * Update CI workflow to test install and packaging * Update CI workflow - install newer cmake for packaging checks * Update cmake/rocprofiler_config_packaging.cmake - disable auto-generation of shared-lib deps and provides for tests package * Update CI workflow - add sbin to PATH for dpkg install * Update CI workflow - remove using github.workspace when installing packages * Update CI workflow - hack to fix ordering of dpkg install * Update CI workflow - whitespace cleanup |
||
|
|
0666f6a197 |
AmdExtTable updated (#292)
* AmdExtTable updated * hsa_amd_agent_set_async_scratch_limit introduced * source formatting (clang-format v11) (#294) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> |
||
|
|
d2a6eec1bf |
Added kernel id to enqueue callback for kernel dispatch (#276)
Adds kernel id as parameter to rocprofiler_profile_counting_dispatch_callback_t. Small cleanup of code in core.cpp. |
||
|
|
1c02e7a92a |
Update documentation (#275)
- finished most of the TODOs |
||
|
|
022d7abc29 |
Documentation Update For Counters (#246)
* Documentation Update * Minor fixes * source formatting (clang-format v11) (#265) Co-authored-by: bwelton <bwelton@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bwelton <bwelton@users.noreply.github.com> |
||
|
|
9a0c84efa6 |
Use -sdk suffix and reset VERSION to 0.0.0 (#263)
* Fix find_package(rocprofiler) in build tree * Move include/rocprofiler to include/rocprofiler-sdk * Update include/CMakeLists.txt - add_subdirectory(rocprofiler-sdk) * Move lib/rocprofiler to lib/rocprofiler-sdk * Move lib/rocprofiler-tool to lib/rocprofiler-sdk-tool * Update lib/CMakeLists.txt - add_subdirectory(rocprofiler-sdk) - add_subdirectory(rocprofiler-sdk-tool) * Update lib/rocprofiler-sdk/CMakeLists.txt * Rename rocprofiler-tool to rocprofiler-sdk-tool * Replace include rocprofiler/ with include rocprofiler-sdk/ * Replace include lib/rocprofiler/ with include lib/rocprofiler-sdk/ * Set VERSION to 0.0.0 and finish install to rocprofiler-sdk * More fixes for rocprofiler -> rocprofiler-sdk - fix issue with rocprofiler-sdk-config.cmake.in - fix counters xml install path * Fix documentation generation * Create rocprofiler_LIB_ROCPROFILER_SDK_DIR for build tree * cmake formatting (cmake-format) (#264) Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> |
||
|
|
fe5d074375 |
Misc updates for distribution (#233)
* Adding tools support * cmake formatting (cmake-format) (#227) Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> * Checking to do rebase * Adding rocprofv2 script * cmake formatting (cmake-format) (#229) Co-authored-by: bgopesh <bgopesh@users.noreply.github.com> * Fixing build for the tool * Removing the requirement for rocm_version * Update rocprofiler_utilities.cmake * C++ filesystem fixes - added source/lib/common/filesystem.hpp - support older compilers which have <experimental/filesystem> and do not have <filesystem> - added samples/common/filesystem.hpp - samples now depend on "common" library which provides the correct filesystem header - renamed rocprofiler-stdcxxfs interface target to rocprofiler-cxx-filesystem - support old LLVM in addition to GNU - fix bin/rocprof/rocprof.cpp - was using VLA * Fix rocprofiler-drm include directories - OpenSUSE only has include/libdrm/drm.h (no include/drm/drm.h) * Tools fixes * Fix for the tools * Fix rocprofv2 script * Fixing Filesystem Issues * source formatting (clang-format v11) (#234) Co-authored-by: ammarwa <ammarwa@users.noreply.github.com> * Vlaindic/pc sampling api update (#235) * pcs: updating PC sampling API * source formatting (clang-format v11) (#232) Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> --------- Co-authored-by: vlaindic <vladimir.indic@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * Vlaindic/pc sampling api update for ammar branch (#244) *Updating the documentation inside pc_sampling.h --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> * pcs: use @p in front of params * pcs: documenting struct fields updated * Fixing PC Sampling Documentation issues * Fixing PC Sampling Documentation * Relocated tools directory to source/lib/rocprofiler-tool * Fixes/updates to rocprofiler-tool - updated CMake - Fixed miscellaneous issues in the code (VLAs, etc.) - Updated rocprofv2 to reflect some minor env variables changes in rocprofiler-tool - Fixed clang-tidy warnings * Update lib/rocprofiler-tool/CMakeLists.txt - link to atomic library * Add $ORIGIN/.. RUNPATH to rocprofiler-tool * Adding readme file for tools * Renaming the tools readme file * Update ReadMe.md * Update ReadMe.md * Documentation updates - overview and explanation of design and concepts * Fix lib/rocprofiler-tool/README.md - delete ReadMe.md * Hacks for build * Update Filesystem * cmake formatting (cmake-format) (#248) Co-authored-by: ammarwa <ammarwa@users.noreply.github.com> * source formatting (clang-format v11) (#249) Co-authored-by: ammarwa <ammarwa@users.noreply.github.com> * source formatting (clang-format v11) (#250) Co-authored-by: ammarwa <ammarwa@users.noreply.github.com> * Addressing review comments on the tool readme file * Revert "Hacks for build" This reverts commit d6688cb3d1226c46fc97e37ced889a5b0d180940. * Fixes for GCC 7.5 compiler in OpenSUSE 15.4 * Update lib/rocprofiler-tool/CMakeLists.txt - link to AQL profile library * Fix lib/rocprofiler-tool/README.md - fix markdown * Fix lib/rocprofiler-tool - fix usage of hsa_ven_amd_loader_query_host_address * Fix unused variable warnings - byproduct of variables only used in assert statements * Update docs - update about.md - more "Important Changes" section here - update tool_library_overview.md - extend "Tool Library Design" section - write "Tool Initialization" section - write "Tool Finalization" section * Add ghc::filesystem submodule * Implement usage of ghc::filesystem * Add ROCPROFILER_BUILD_GHC_FS option - option to use external/filesystem (ghc) * Update samples/counter-collection - compile flags - common library - fixes for warnings * Update tests/kernel-tracing/CMakeLists.txt - change install location of kernel-tracing-test-tool and install rpath * Update samples/common/CMakeLists.txt - compile features requiring C++17 * Update lib/rocprofiler-tool/tool.cpp - remove include <filesystem> - comment out unused variable - remove unused functions - move some functions into anonymous namespace --------- Co-authored-by: Sriraksha Nagaraj <Sriraksha.Nagaraj@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com> Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com> Co-authored-by: bgopesh <bgopesh@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> Co-authored-by: ammarwa <ammarwa@users.noreply.github.com> Co-authored-by: vlaindic <vladimir.indic@amd.com> Co-authored-by: vlaindic <vlaindic@users.noreply.github.com> Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com> Co-authored-by: Benjamin Welton <bewelton@amd.com> Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> |
||
|
|
e8a5845661 |
Buffered Counter Collection API (#179)
* Added buffer counter collection API. Initial testing added into counter-collection sample. Added support for constant metrics in counter collection (#194) * Added support for constant metrics in counter collection Adds support and test cases for constant metrics (such as max wave size) and adds the metric kernel duration (though this is still not yet calculated). * Minor doc updates * Simple counter unit tests (#199) * Simple counter unit tests Unit tests and some minor fixes for simple and derived counter evaluation * Added unit tests for reduction operations (#200) * Added unit tests for reduction operations * added tests for combo (constant+regular) counters (#201) source formatting (clang-format v11) (#202) Co-authored-by: bwelton <bwelton@users.noreply.github.com> source formatting (clang-format v11) (#203) Co-authored-by: bwelton <bwelton@users.noreply.github.com> Local changes source formatting (clang-format v11) (#205) Co-authored-by: bwelton <bwelton@users.noreply.github.com> Minor doc fix Remove kernel_duration, migrate over set_dimensions to after HSA init source formatting (clang-format v11) (#207) Co-authored-by: bwelton <bwelton@users.noreply.github.com> Added output to ROCPROFILER_SAMPLE_OUTPUT_FILE: * Remove integer based counter in return struct This casues a lot of complications and seems to provide limit benefit of just treating all counters as doubles. For ease of use, drop the integer based counter. * source formatting (clang-format v11) (#217) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Add correlation id support to counters (#218) Adds correlation id support to counter collection. Requires tracing to be enabled to return any useful value currently (since we do not have HIP kernel tracing yet). * source formatting (clang-format v11) (#223) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Add sample that attempts to fetch all counters On whatever machine this test is run on, all counters available on the platform will attempted to be fetched from a kernel execution. Each counter will be fetched one time to check that the counter can be fetched on the platform and that the counter is returning the correct instance count (however due to the lack of transparency from AQL profiler this check is not functional for some counters). We do not do any implicit reduction on any counter, the result is that we see more counters than the number of events being requested. Below is the status of all counters on MI210. All counters appear functional with the changes in this PR. However, the instance count retruned will be greater than that returned by rocprofiler_query_counter_instance_count. Got 516 counters collected Counter ID: 0 (size) expected 1 instances and got 1 Counter ID: 1 (processor_id_low) expected 1 instances and got 1 Counter ID: 2 (capability) expected 1 instances and got 1 Counter ID: 3 (local_mem_size) expected 1 instances and got 1 Counter ID: 4 (min_latency) expected 1 instances and got 1 Counter ID: 5 (weight) expected 1 instances and got 1 Counter ID: 6 (node_from) expected 1 instances and got 1 Counter ID: 7 (version_major) expected 1 instances and got 1 Counter ID: 8 (version_minor) expected 1 instances and got 1 Counter ID: 9 (mem_clk_max) expected 1 instances and got 1 Counter ID: 10 (num_xcc) expected 1 instances and got 1 Counter ID: 11 (width) expected 1 instances and got 1 Counter ID: 12 (flags) expected 1 instances and got 1 Counter ID: 13 (size_in_bytes) expected 1 instances and got 1 Counter ID: 14 (array_count) expected 1 instances and got 1 Counter ID: 15 (num_gws) expected 1 instances and got 1 Counter ID: 16 (simd_id_base) expected 1 instances and got 1 Counter ID: 17 (max_waves_per_simd) expected 1 instances and got 1 Counter ID: 18 (sdma_fw_version) expected 1 instances and got 1 Counter ID: 19 (gfx_target_version) expected 1 instances and got 1 Counter ID: 20 (max_bandwidth) expected 1 instances and got 1 Counter ID: 21 (cpu_core_id_base) expected 1 instances and got 1 Counter ID: 22 (cache_line_size) expected 1 instances and got 1 Counter ID: 23 (level) expected 1 instances and got 1 Counter ID: 24 (min_bandwidth) expected 1 instances and got 1 Counter ID: 25 (location_id) expected 1 instances and got 1 Counter ID: 26 (wave_front_size) expected 1 instances and got 1 Counter ID: 27 (lds_size_in_kb) expected 1 instances and got 1 Counter ID: 28 (simd_count) expected 1 instances and got 1 Counter ID: 29 (fw_version) expected 1 instances and got 1 Counter ID: 30 (recommended_transfer_size) expected 1 instances and got 1 Counter ID: 31 (simd_per_cu) expected 1 instances and got 1 Counter ID: 32 (association) expected 1 instances and got 1 Counter ID: 33 (mem_banks_count) expected 1 instances and got 1 Counter ID: 34 (latency) expected 1 instances and got 1 Counter ID: 35 (max_latency) expected 1 instances and got 1 Counter ID: 36 (cpu_cores_count) expected 1 instances and got 1 Counter ID: 37 (io_links_count) expected 1 instances and got 1 Counter ID: 38 (domain) expected 1 instances and got 1 Counter ID: 39 (max_engine_clk_fcompute) expected 1 instances and got 1 Counter ID: 40 (caches_count) expected 1 instances and got 1 Counter ID: 41 (simd_arrays_per_engine) expected 1 instances and got 1 Counter ID: 42 (cache_lines_per_tag) expected 1 instances and got 1 Counter ID: 43 (gds_size_in_kb) expected 1 instances and got 1 Counter ID: 44 (cu_per_simd_array) expected 1 instances and got 1 Counter ID: 45 (type) expected 1 instances and got 1 Counter ID: 46 (max_slots_scratch_cu) expected 1 instances and got 1 Counter ID: 47 (vendor_id) expected 1 instances and got 1 Counter ID: 48 (device_id) expected 1 instances and got 1 Counter ID: 49 (heap_type) expected 1 instances and got 1 Counter ID: 50 (drm_render_minor) expected 1 instances and got 1 Counter ID: 51 (num_sdma_engines) expected 1 instances and got 1 Counter ID: 52 (node_to) expected 1 instances and got 1 Counter ID: 53 (num_sdma_xgmi_engines) expected 1 instances and got 1 Counter ID: 54 (num_sdma_queues_per_engine) expected 1 instances and got 1 Counter ID: 55 (hive_id) expected 1 instances and got 1 Counter ID: 56 (num_cp_queues) expected 1 instances and got 1 Counter ID: 57 (max_engine_clk_ccompute) expected 1 instances and got 1 Counter ID: 517 (MAX_WAVE_SIZE) expected 1 instances and got 1 Counter ID: 518 (SE_NUM) expected 1 instances and got 1 Counter ID: 519 (SIMD_NUM) expected 1 instances and got 1 Counter ID: 520 (CU_NUM) expected 1 instances and got 1 [ERROR]Counter ID: 521 (SQ_WAIT_INST_LDS) expected 1 instances and got 8 [ERROR]Counter ID: 522 (TCP_TCP_TA_DATA_STALL_CYCLES) expected 16 instances and got 128 Counter ID: 523 (GRBM_COUNT) expected 1 instances and got 1 Counter ID: 524 (GRBM_GUI_ACTIVE) expected 1 instances and got 1 Counter ID: 525 (GRBM_CP_BUSY) expected 1 instances and got 1 Counter ID: 526 (GRBM_SPI_BUSY) expected 1 instances and got 1 Counter ID: 527 (GRBM_TA_BUSY) expected 1 instances and got 1 Counter ID: 528 (GRBM_TC_BUSY) expected 1 instances and got 1 Counter ID: 529 (GRBM_CPC_BUSY) expected 1 instances and got 1 Counter ID: 530 (GRBM_CPF_BUSY) expected 1 instances and got 1 Counter ID: 531 (GRBM_UTCL2_BUSY) expected 1 instances and got 1 Counter ID: 532 (GRBM_EA_BUSY) expected 1 instances and got 1 Counter ID: 533 (CPC_ME1_BUSY_FOR_PACKET_DECODE) expected 1 instances and got 1 Counter ID: 534 (CPC_UTCL1_STALL_ON_TRANSLATION) expected 1 instances and got 1 Counter ID: 535 (CPC_CPC_STAT_BUSY) expected 1 instances and got 1 Counter ID: 536 (CPC_CPC_STAT_IDLE) expected 1 instances and got 1 Counter ID: 537 (CPC_CPC_STAT_STALL) expected 1 instances and got 1 Counter ID: 538 (CPC_CPC_TCIU_BUSY) expected 1 instances and got 1 Counter ID: 539 (CPC_CPC_TCIU_IDLE) expected 1 instances and got 1 Counter ID: 540 (CPC_CPC_UTCL2IU_BUSY) expected 1 instances and got 1 Counter ID: 541 (CPC_CPC_UTCL2IU_IDLE) expected 1 instances and got 1 Counter ID: 542 (CPC_CPC_UTCL2IU_STALL) expected 1 instances and got 1 Counter ID: 543 (CPC_ME1_DC0_SPI_BUSY) expected 1 instances and got 1 Counter ID: 544 (CPF_CMP_UTCL1_STALL_ON_TRANSLATION) expected 1 instances and got 1 Counter ID: 545 (CPF_CPF_STAT_BUSY) expected 1 instances and got 1 Counter ID: 546 (CPF_CPF_STAT_IDLE) expected 1 instances and got 1 Counter ID: 547 (CPF_CPF_STAT_STALL) expected 1 instances and got 1 Counter ID: 548 (CPF_CPF_TCIU_BUSY) expected 1 instances and got 1 Counter ID: 549 (CPF_CPF_TCIU_IDLE) expected 1 instances and got 1 Counter ID: 550 (CPF_CPF_TCIU_STALL) expected 1 instances and got 1 [ERROR]Counter ID: 551 (SPI_CSN_WINDOW_VALID) expected 1 instances and got 8 [ERROR]Counter ID: 552 (SPI_CSN_BUSY) expected 1 instances and got 8 [ERROR]Counter ID: 553 (SPI_CSN_NUM_THREADGROUPS) expected 1 instances and got 8 [ERROR]Counter ID: 554 (SPI_CSN_WAVE) expected 1 instances and got 8 [ERROR]Counter ID: 555 (SPI_RA_REQ_NO_ALLOC) expected 1 instances and got 8 [ERROR]Counter ID: 556 (SPI_RA_REQ_NO_ALLOC_CSN) expected 1 instances and got 8 [ERROR]Counter ID: 557 (SPI_RA_RES_STALL_CSN) expected 1 instances and got 8 [ERROR]Counter ID: 558 (SPI_RA_TMP_STALL_CSN) expected 1 instances and got 8 [ERROR]Counter ID: 559 (SPI_RA_WAVE_SIMD_FULL_CSN) expected 1 instances and got 8 [ERROR]Counter ID: 560 (SPI_RA_VGPR_SIMD_FULL_CSN) expected 1 instances and got 8 [ERROR]Counter ID: 561 (SPI_RA_SGPR_SIMD_FULL_CSN) expected 1 instances and got 8 [ERROR]Counter ID: 562 (SPI_RA_LDS_CU_FULL_CSN) expected 1 instances and got 8 [ERROR]Counter ID: 563 (SPI_RA_BAR_CU_FULL_CSN) expected 1 instances and got 8 [ERROR]Counter ID: 564 (SPI_RA_BULKY_CU_FULL_CSN) expected 1 instances and got 8 [ERROR]Counter ID: 565 (SPI_RA_TGLIM_CU_FULL_CSN) expected 1 instances and got 8 [ERROR]Counter ID: 566 (SPI_RA_WVLIM_STALL_CSN) expected 1 instances and got 8 [ERROR]Counter ID: 567 (SPI_SWC_CSC_WR) expected 1 instances and got 8 [ERROR]Counter ID: 568 (SPI_VWC_CSC_WR) expected 1 instances and got 8 [ERROR]Counter ID: 569 (SQ_ACCUM_PREV) expected 1 instances and got 8 [ERROR]Counter ID: 570 (SQ_CYCLES) expected 1 instances and got 8 [ERROR]Counter ID: 571 (SQ_BUSY_CYCLES) expected 1 instances and got 8 [ERROR]Counter ID: 572 (SQ_WAVES) expected 1 instances and got 8 [ERROR]Counter ID: 573 (SQ_LEVEL_WAVES) expected 1 instances and got 8 [ERROR]Counter ID: 574 (SQ_WAVES_EQ_64) expected 1 instances and got 8 [ERROR]Counter ID: 575 (SQ_WAVES_LT_64) expected 1 instances and got 8 [ERROR]Counter ID: 576 (SQ_WAVES_LT_48) expected 1 instances and got 8 [ERROR]Counter ID: 577 (SQ_WAVES_LT_32) expected 1 instances and got 8 [ERROR]Counter ID: 578 (SQ_WAVES_LT_16) expected 1 instances and got 8 [ERROR]Counter ID: 579 (SQ_BUSY_CU_CYCLES) expected 1 instances and got 8 [ERROR]Counter ID: 580 (SQ_ITEMS) expected 1 instances and got 8 [ERROR]Counter ID: 581 (SQ_INSTS) expected 1 instances and got 8 [ERROR]Counter ID: 582 (SQ_INSTS_VALU) expected 1 instances and got 8 [ERROR]Counter ID: 583 (SQ_INSTS_VALU_ADD_F16) expected 1 instances and got 8 [ERROR]Counter ID: 584 (SQ_INSTS_VALU_MUL_F16) expected 1 instances and got 8 [ERROR]Counter ID: 585 (SQ_INSTS_VALU_FMA_F16) expected 1 instances and got 8 [ERROR]Counter ID: 586 (SQ_INSTS_VALU_TRANS_F16) expected 1 instances and got 8 [ERROR]Counter ID: 587 (SQ_INSTS_VALU_ADD_F32) expected 1 instances and got 8 [ERROR]Counter ID: 588 (SQ_INSTS_VALU_MUL_F32) expected 1 instances and got 8 [ERROR]Counter ID: 589 (SQ_INSTS_VALU_FMA_F32) expected 1 instances and got 8 [ERROR]Counter ID: 590 (SQ_INSTS_VALU_TRANS_F32) expected 1 instances and got 8 [ERROR]Counter ID: 591 (SQ_INSTS_VALU_ADD_F64) expected 1 instances and got 8 [ERROR]Counter ID: 592 (SQ_INSTS_VALU_MUL_F64) expected 1 instances and got 8 [ERROR]Counter ID: 593 (SQ_INSTS_VALU_FMA_F64) expected 1 instances and got 8 [ERROR]Counter ID: 594 (SQ_INSTS_VALU_TRANS_F64) expected 1 instances and got 8 [ERROR]Counter ID: 595 (SQ_INSTS_VALU_INT32) expected 1 instances and got 8 [ERROR]Counter ID: 596 (SQ_INSTS_VALU_INT64) expected 1 instances and got 8 [ERROR]Counter ID: 597 (SQ_INSTS_VALU_CVT) expected 1 instances and got 8 [ERROR]Counter ID: 598 (SQ_INSTS_VALU_MFMA_I8) expected 1 instances and got 8 [ERROR]Counter ID: 599 (SQ_INSTS_VALU_MFMA_F16) expected 1 instances and got 8 [ERROR]Counter ID: 600 (SQ_INSTS_VALU_MFMA_BF16) expected 1 instances and got 8 [ERROR]Counter ID: 601 (SQ_INSTS_VALU_MFMA_F32) expected 1 instances and got 8 [ERROR]Counter ID: 602 (SQ_INSTS_VALU_MFMA_F64) expected 1 instances and got 8 [ERROR]Counter ID: 603 (SQ_INSTS_VALU_MFMA_MOPS_I8) expected 1 instances and got 8 [ERROR]Counter ID: 604 (SQ_INSTS_VALU_MFMA_MOPS_F16) expected 1 instances and got 8 [ERROR]Counter ID: 605 (SQ_INSTS_VALU_MFMA_MOPS_BF16) expected 1 instances and got 8 [ERROR]Counter ID: 606 (SQ_INSTS_VALU_MFMA_MOPS_F32) expected 1 instances and got 8 [ERROR]Counter ID: 607 (SQ_INSTS_VALU_MFMA_MOPS_F64) expected 1 instances and got 8 [ERROR]Counter ID: 608 (SQ_INSTS_MFMA) expected 1 instances and got 8 [ERROR]Counter ID: 609 (SQ_INSTS_VMEM_WR) expected 1 instances and got 8 [ERROR]Counter ID: 610 (SQ_INSTS_VMEM_RD) expected 1 instances and got 8 [ERROR]Counter ID: 611 (SQ_INSTS_VMEM) expected 1 instances and got 8 [ERROR]Counter ID: 612 (SQ_INSTS_SALU) expected 1 instances and got 8 [ERROR]Counter ID: 613 (SQ_INSTS_SMEM) expected 1 instances and got 8 [ERROR]Counter ID: 614 (SQ_INSTS_FLAT) expected 1 instances and got 8 [ERROR]Counter ID: 615 (SQ_INSTS_FLAT_LDS_ONLY) expected 1 instances and got 8 [ERROR]Counter ID: 616 (SQ_INSTS_LDS) expected 1 instances and got 8 [ERROR]Counter ID: 617 (SQ_INSTS_GDS) expected 1 instances and got 8 [ERROR]Counter ID: 618 (SQ_INSTS_EXP_GDS) expected 1 instances and got 8 [ERROR]Counter ID: 619 (SQ_INSTS_BRANCH) expected 1 instances and got 8 [ERROR]Counter ID: 620 (SQ_INSTS_SENDMSG) expected 1 instances and got 8 [ERROR]Counter ID: 621 (SQ_INSTS_VSKIPPED) expected 1 instances and got 8 [ERROR]Counter ID: 622 (SQ_INST_LEVEL_VMEM) expected 1 instances and got 8 [ERROR]Counter ID: 623 (SQ_INST_LEVEL_SMEM) expected 1 instances and got 8 [ERROR]Counter ID: 624 (SQ_INST_LEVEL_LDS) expected 1 instances and got 8 [ERROR]Counter ID: 625 (SQ_VALU_MFMA_BUSY_CYCLES) expected 1 instances and got 8 [ERROR]Counter ID: 626 (SQ_WAVE_CYCLES) expected 1 instances and got 8 [ERROR]Counter ID: 627 (SQ_WAIT_ANY) expected 1 instances and got 8 [ERROR]Counter ID: 628 (SQ_WAIT_INST_ANY) expected 1 instances and got 8 [ERROR]Counter ID: 629 (SQ_ACTIVE_INST_ANY) expected 1 instances and got 8 [ERROR]Counter ID: 630 (SQ_ACTIVE_INST_VMEM) expected 1 instances and got 8 [ERROR]Counter ID: 631 (SQ_ACTIVE_INST_LDS) expected 1 instances and got 8 [ERROR]Counter ID: 632 (SQ_ACTIVE_INST_VALU) expected 1 instances and got 8 [ERROR]Counter ID: 633 (SQ_ACTIVE_INST_SCA) expected 1 instances and got 8 [ERROR]Counter ID: 634 (SQ_ACTIVE_INST_EXP_GDS) expected 1 instances and got 8 [ERROR]Counter ID: 635 (SQ_ACTIVE_INST_MISC) expected 1 instances and got 8 [ERROR]Counter ID: 636 (SQ_ACTIVE_INST_FLAT) expected 1 instances and got 8 [ERROR]Counter ID: 637 (SQ_INST_CYCLES_VMEM_WR) expected 1 instances and got 8 [ERROR]Counter ID: 638 (SQ_INST_CYCLES_VMEM_RD) expected 1 instances and got 8 [ERROR]Counter ID: 639 (SQ_INST_CYCLES_SMEM) expected 1 instances and got 8 [ERROR]Counter ID: 640 (SQ_INST_CYCLES_SALU) expected 1 instances and got 8 [ERROR]Counter ID: 641 (SQ_THREAD_CYCLES_VALU) expected 1 instances and got 8 [ERROR]Counter ID: 642 (SQ_IFETCH) expected 1 instances and got 8 [ERROR]Counter ID: 643 (SQ_IFETCH_LEVEL) expected 1 instances and got 8 [ERROR]Counter ID: 644 (SQ_LDS_BANK_CONFLICT) expected 1 instances and got 8 [ERROR]Counter ID: 645 (SQ_LDS_ADDR_CONFLICT) expected 1 instances and got 8 [ERROR]Counter ID: 646 (SQ_LDS_UNALIGNED_STALL) expected 1 instances and got 8 [ERROR]Counter ID: 647 (SQ_LDS_MEM_VIOLATIONS) expected 1 instances and got 8 [ERROR]Counter ID: 648 (SQ_LDS_ATOMIC_RETURN) expected 1 instances and got 8 [ERROR]Counter ID: 649 (SQ_LDS_IDX_ACTIVE) expected 1 instances and got 8 [ERROR]Counter ID: 650 (SQ_ACCUM_PREV_HIRES) expected 1 instances and got 8 [ERROR]Counter ID: 651 (SQ_WAVES_RESTORED) expected 1 instances and got 8 [ERROR]Counter ID: 652 (SQ_WAVES_SAVED) expected 1 instances and got 8 [ERROR]Counter ID: 653 (SQ_INSTS_SMEM_NORM) expected 1 instances and got 8 [ERROR]Counter ID: 654 (SQC_DCACHE_INPUT_VALID_READYB) expected 1 instances and got 8 [ERROR]Counter ID: 655 (SQC_TC_REQ) expected 1 instances and got 8 [ERROR]Counter ID: 656 (SQC_TC_INST_REQ) expected 1 instances and got 8 [ERROR]Counter ID: 657 (SQC_TC_DATA_READ_REQ) expected 1 instances and got 8 [ERROR]Counter ID: 658 (SQC_TC_DATA_WRITE_REQ) expected 1 instances and got 8 [ERROR]Counter ID: 659 (SQC_TC_DATA_ATOMIC_REQ) expected 1 instances and got 8 [ERROR]Counter ID: 660 (SQC_TC_STALL) expected 1 instances and got 8 [ERROR]Counter ID: 661 (SQC_ICACHE_REQ) expected 1 instances and got 8 [ERROR]Counter ID: 662 (SQC_ICACHE_HITS) expected 1 instances and got 8 [ERROR]Counter ID: 663 (SQC_ICACHE_MISSES) expected 1 instances and got 8 [ERROR]Counter ID: 664 (SQC_ICACHE_MISSES_DUPLICATE) expected 1 instances and got 8 [ERROR]Counter ID: 665 (SQC_DCACHE_REQ) expected 1 instances and got 8 [ERROR]Counter ID: 666 (SQC_DCACHE_HITS) expected 1 instances and got 8 [ERROR]Counter ID: 667 (SQC_DCACHE_MISSES) expected 1 instances and got 8 [ERROR]Counter ID: 668 (SQC_DCACHE_MISSES_DUPLICATE) expected 1 instances and got 8 [ERROR]Counter ID: 669 (SQC_DCACHE_ATOMIC) expected 1 instances and got 8 [ERROR]Counter ID: 670 (SQC_DCACHE_REQ_READ_1) expected 1 instances and got 8 [ERROR]Counter ID: 671 (SQC_DCACHE_REQ_READ_2) expected 1 instances and got 8 [ERROR]Counter ID: 672 (SQC_DCACHE_REQ_READ_4) expected 1 instances and got 8 [ERROR]Counter ID: 673 (SQC_DCACHE_REQ_READ_8) expected 1 instances and got 8 [ERROR]Counter ID: 674 (SQC_DCACHE_REQ_READ_16) expected 1 instances and got 8 [ERROR]Counter ID: 675 (TA_TA_BUSY) expected 16 instances and got 128 [ERROR]Counter ID: 676 (TA_TOTAL_WAVEFRONTS) expected 16 instances and got 128 [ERROR]Counter ID: 677 (TA_BUFFER_WAVEFRONTS) expected 16 instances and got 128 [ERROR]Counter ID: 678 (TA_BUFFER_READ_WAVEFRONTS) expected 16 instances and got 128 [ERROR]Counter ID: 679 (TA_BUFFER_WRITE_WAVEFRONTS) expected 16 instances and got 128 [ERROR]Counter ID: 680 (TA_BUFFER_ATOMIC_WAVEFRONTS) expected 16 instances and got 128 [ERROR]Counter ID: 681 (TA_BUFFER_TOTAL_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 682 (TA_BUFFER_COALESCED_READ_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 683 (TA_BUFFER_COALESCED_WRITE_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 684 (TA_ADDR_STALLED_BY_TC_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 685 (TA_ADDR_STALLED_BY_TD_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 686 (TA_DATA_STALLED_BY_TC_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 687 (TA_FLAT_WAVEFRONTS) expected 16 instances and got 128 [ERROR]Counter ID: 688 (TA_FLAT_READ_WAVEFRONTS) expected 16 instances and got 128 [ERROR]Counter ID: 689 (TA_FLAT_WRITE_WAVEFRONTS) expected 16 instances and got 128 [ERROR]Counter ID: 690 (TA_FLAT_ATOMIC_WAVEFRONTS) expected 16 instances and got 128 [ERROR]Counter ID: 691 (TD_TD_BUSY) expected 16 instances and got 128 [ERROR]Counter ID: 692 (TD_TC_STALL) expected 16 instances and got 128 [ERROR]Counter ID: 693 (TD_SPI_STALL) expected 16 instances and got 128 [ERROR]Counter ID: 694 (TD_LOAD_WAVEFRONT) expected 16 instances and got 128 [ERROR]Counter ID: 695 (TD_ATOMIC_WAVEFRONT) expected 16 instances and got 128 [ERROR]Counter ID: 696 (TD_STORE_WAVEFRONT) expected 16 instances and got 128 [ERROR]Counter ID: 697 (TD_COALESCABLE_WAVEFRONT) expected 16 instances and got 128 [ERROR]Counter ID: 698 (TCP_GATE_EN1) expected 16 instances and got 128 [ERROR]Counter ID: 699 (TCP_GATE_EN2) expected 16 instances and got 128 [ERROR]Counter ID: 700 (TCP_TD_TCP_STALL_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 701 (TCP_TCR_TCP_STALL_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 702 (TCP_READ_TAGCONFLICT_STALL_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 703 (TCP_WRITE_TAGCONFLICT_STALL_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 704 (TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 705 (TCP_PENDING_STALL_CYCLES) expected 16 instances and got 128 [ERROR]Counter ID: 706 (TCP_TA_TCP_STATE_READ) expected 16 instances and got 128 [ERROR]Counter ID: 707 (TCP_VOLATILE) expected 16 instances and got 128 [ERROR]Counter ID: 708 (TCP_TOTAL_ACCESSES) expected 16 instances and got 128 [ERROR]Counter ID: 709 (TCP_TOTAL_READ) expected 16 instances and got 128 [ERROR]Counter ID: 710 (TCP_TOTAL_WRITE) expected 16 instances and got 128 [ERROR]Counter ID: 711 (TCP_TOTAL_ATOMIC_WITH_RET) expected 16 instances and got 128 [ERROR]Counter ID: 712 (TCP_TOTAL_ATOMIC_WITHOUT_RET) expected 16 instances and got 128 [ERROR]Counter ID: 713 (TCP_TOTAL_WRITEBACK_INVALIDATES) expected 16 instances and got 128 [ERROR]Counter ID: 714 (TCP_UTCL1_REQUEST) expected 16 instances and got 128 [ERROR]Counter ID: 715 (TCP_UTCL1_TRANSLATION_MISS) expected 16 instances and got 128 [ERROR]Counter ID: 716 (TCP_UTCL1_TRANSLATION_HIT) expected 16 instances and got 128 [ERROR]Counter ID: 717 (TCP_UTCL1_PERMISSION_MISS) expected 16 instances and got 128 [ERROR]Counter ID: 718 (TCP_TOTAL_CACHE_ACCESSES) expected 16 instances and got 128 [ERROR]Counter ID: 719 (TCP_TCP_LATENCY) expected 16 instances and got 128 [ERROR]Counter ID: 720 (TCP_TCC_READ_REQ_LATENCY) expected 16 instances and got 128 [ERROR]Counter ID: 721 (TCP_TCC_WRITE_REQ_LATENCY) expected 16 instances and got 128 [ERROR]Counter ID: 722 (TCP_TCC_READ_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 723 (TCP_TCC_WRITE_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 724 (TCP_TCC_ATOMIC_WITH_RET_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 725 (TCP_TCC_ATOMIC_WITHOUT_RET_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 726 (TCP_TCC_NC_READ_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 727 (TCP_TCC_NC_WRITE_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 728 (TCP_TCC_NC_ATOMIC_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 729 (TCP_TCC_UC_READ_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 730 (TCP_TCC_UC_WRITE_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 731 (TCP_TCC_UC_ATOMIC_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 732 (TCP_TCC_CC_READ_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 733 (TCP_TCC_CC_WRITE_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 734 (TCP_TCC_CC_ATOMIC_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 735 (TCP_TCC_RW_READ_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 736 (TCP_TCC_RW_WRITE_REQ) expected 16 instances and got 128 [ERROR]Counter ID: 737 (TCP_TCC_RW_ATOMIC_REQ) expected 16 instances and got 128 Counter ID: 738 (TCA_CYCLE) expected 32 instances and got 32 Counter ID: 739 (TCA_BUSY) expected 32 instances and got 32 Counter ID: 740 (TCC_CYCLE) expected 32 instances and got 32 Counter ID: 741 (TCC_BUSY) expected 32 instances and got 32 Counter ID: 742 (TCC_REQ) expected 32 instances and got 32 Counter ID: 743 (TCC_STREAMING_REQ) expected 32 instances and got 32 Counter ID: 744 (TCC_NC_REQ) expected 32 instances and got 32 Counter ID: 745 (TCC_UC_REQ) expected 32 instances and got 32 Counter ID: 746 (TCC_CC_REQ) expected 32 instances and got 32 Counter ID: 747 (TCC_RW_REQ) expected 32 instances and got 32 Counter ID: 748 (TCC_PROBE) expected 32 instances and got 32 Counter ID: 749 (TCC_PROBE_ALL) expected 32 instances and got 32 Counter ID: 750 (TCC_READ) expected 32 instances and got 32 Counter ID: 751 (TCC_WRITE) expected 32 instances and got 32 Counter ID: 752 (TCC_ATOMIC) expected 32 instances and got 32 Counter ID: 753 (TCC_HIT) expected 32 instances and got 32 Counter ID: 754 (TCC_MISS) expected 32 instances and got 32 Counter ID: 755 (TCC_WRITEBACK) expected 32 instances and got 32 Counter ID: 756 (TCC_EA_WRREQ) expected 32 instances and got 32 Counter ID: 757 (TCC_EA_WRREQ_64B) expected 32 instances and got 32 Counter ID: 758 (TCC_EA_WR_UNCACHED_32B) expected 32 instances and got 32 Counter ID: 759 (TCC_EA_WRREQ_STALL) expected 32 instances and got 32 Counter ID: 760 (TCC_EA_WRREQ_IO_CREDIT_STALL) expected 32 instances and got 32 Counter ID: 761 (TCC_EA_WRREQ_GMI_CREDIT_STALL) expected 32 instances and got 32 Counter ID: 762 (TCC_EA_WRREQ_DRAM_CREDIT_STALL) expected 32 instances and got 32 Counter ID: 763 (TCC_TOO_MANY_EA_WRREQS_STALL) expected 32 instances and got 32 Counter ID: 764 (TCC_EA_WRREQ_LEVEL) expected 32 instances and got 32 Counter ID: 765 (TCC_EA_ATOMIC) expected 32 instances and got 32 Counter ID: 766 (TCC_EA_ATOMIC_LEVEL) expected 32 instances and got 32 Counter ID: 767 (TCC_EA_RDREQ) expected 32 instances and got 32 Counter ID: 768 (TCC_EA_RDREQ_32B) expected 32 instances and got 32 Counter ID: 769 (TCC_EA_RD_UNCACHED_32B) expected 32 instances and got 32 Counter ID: 770 (TCC_EA_RDREQ_IO_CREDIT_STALL) expected 32 instances and got 32 Counter ID: 771 (TCC_EA_RDREQ_GMI_CREDIT_STALL) expected 32 instances and got 32 Counter ID: 772 (TCC_EA_RDREQ_DRAM_CREDIT_STALL) expected 32 instances and got 32 Counter ID: 773 (TCC_EA_RDREQ_LEVEL) expected 32 instances and got 32 Counter ID: 774 (TCC_TAG_STALL) expected 32 instances and got 32 Counter ID: 775 (TCC_NORMAL_WRITEBACK) expected 32 instances and got 32 Counter ID: 776 (TCC_ALL_TC_OP_WB_WRITEBACK) expected 32 instances and got 32 Counter ID: 777 (TCC_NORMAL_EVICT) expected 32 instances and got 32 Counter ID: 778 (TCC_ALL_TC_OP_INV_EVICT) expected 32 instances and got 32 Counter ID: 779 (TCC_EA_RDREQ_DRAM) expected 32 instances and got 32 Counter ID: 780 (TCC_EA_WRREQ_DRAM) expected 32 instances and got 32 [ERROR]Counter ID: 1893 (MeanOccupancyPerCU) expected 1 instances and got 8 [ERROR]Counter ID: 1894 (MeanOccupancyPerActiveCU) expected 1 instances and got 8 [ERROR]Counter ID: 1895 (TA_BUSY_avr) expected 16 instances and got 1 [ERROR]Counter ID: 1896 (TA_BUSY_max) expected 16 instances and got 1 [ERROR]Counter ID: 1897 (TA_BUSY_min) expected 16 instances and got 1 [ERROR]Counter ID: 1898 (TA_TA_BUSY_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1899 (TA_TOTAL_WAVEFRONTS_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1900 (TA_ADDR_STALLED_BY_TC_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1901 (TA_ADDR_STALLED_BY_TD_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1902 (TA_DATA_STALLED_BY_TC_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1903 (TA_FLAT_WAVEFRONTS_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1904 (TA_FLAT_READ_WAVEFRONTS_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1905 (TA_FLAT_WRITE_WAVEFRONTS_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1906 (TA_FLAT_ATOMIC_WAVEFRONTS_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1907 (TA_BUFFER_WAVEFRONTS_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1908 (TA_BUFFER_READ_WAVEFRONTS_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1909 (TA_BUFFER_WRITE_WAVEFRONTS_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1910 (TA_BUFFER_ATOMIC_WAVEFRONTS_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1911 (TA_BUFFER_TOTAL_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1912 (TA_BUFFER_COALESCED_READ_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1913 (TA_BUFFER_COALESCED_WRITE_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1914 (TD_TD_BUSY_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1915 (TD_TC_STALL_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1916 (TD_LOAD_WAVEFRONT_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1917 (TD_ATOMIC_WAVEFRONT_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1918 (TD_STORE_WAVEFRONT_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1919 (TD_COALESCABLE_WAVEFRONT_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1920 (TD_SPI_STALL_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1921 (TCP_GATE_EN1_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1922 (TCP_GATE_EN2_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1923 (TCP_TD_TCP_STALL_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1924 (TCP_TCR_TCP_STALL_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1925 (TCP_READ_TAGCONFLICT_STALL_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1926 (TCP_WRITE_TAGCONFLICT_STALL_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1927 (TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1928 (TCP_VOLATILE_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1929 (TCP_TOTAL_ACCESSES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1930 (TCP_TOTAL_READ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1931 (TCP_TOTAL_WRITE_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1932 (TCP_TOTAL_ATOMIC_WITH_RET_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1933 (TCP_TOTAL_ATOMIC_WITHOUT_RET_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1934 (TCP_TOTAL_WRITEBACK_INVALIDATES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1935 (TCP_UTCL1_REQUEST_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1936 (TCP_UTCL1_TRANSLATION_MISS_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1937 (TCP_UTCL1_TRANSLATION_HIT_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1938 (TCP_UTCL1_PERMISSION_MISS_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1939 (TCP_TOTAL_CACHE_ACCESSES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1940 (TCP_TCP_LATENCY_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1941 (TCP_TA_TCP_STATE_READ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1942 (TCP_TCC_READ_REQ_LATENCY_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1943 (TCP_TCC_WRITE_REQ_LATENCY_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1944 (TCP_TCC_READ_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1945 (TCP_TCC_WRITE_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1946 (TCP_TCC_ATOMIC_WITH_RET_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1947 (TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1948 (TCP_TCC_NC_READ_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1949 (TCP_TCC_NC_WRITE_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1950 (TCP_TCC_NC_ATOMIC_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1951 (TCP_TCC_UC_READ_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1952 (TCP_TCC_UC_WRITE_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1953 (TCP_TCC_UC_ATOMIC_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1954 (TCP_TCC_CC_READ_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1955 (TCP_TCC_CC_WRITE_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1956 (TCP_TCC_CC_ATOMIC_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1957 (TCP_TCC_RW_READ_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1958 (TCP_TCC_RW_WRITE_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1959 (TCP_TCC_RW_ATOMIC_REQ_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1960 (TCP_PENDING_STALL_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 1961 (TCA_CYCLE_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1962 (TCA_BUSY_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1963 (TCC_BUSY_avr) expected 32 instances and got 1 [ERROR]Counter ID: 1964 (TCC_WRREQ_STALL_max) expected 32 instances and got 1 [ERROR]Counter ID: 1965 (TCC_CYCLE_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1966 (TCC_BUSY_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1967 (TCC_REQ_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1968 (TCC_STREAMING_REQ_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1969 (TCC_NC_REQ_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1970 (TCC_UC_REQ_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1971 (TCC_CC_REQ_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1972 (TCC_RW_REQ_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1973 (TCC_PROBE_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1974 (TCC_PROBE_ALL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1975 (TCC_READ_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1976 (TCC_WRITE_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1977 (TCC_ATOMIC_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1978 (TCC_HIT_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1979 (TCC_MISS_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1980 (TCC_WRITEBACK_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1981 (TCC_EA_WRREQ_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1982 (TCC_EA_WRREQ_64B_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1983 (TCC_EA_WR_UNCACHED_32B_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1984 (TCC_EA_WRREQ_STALL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1985 (TCC_EA_WRREQ_IO_CREDIT_STALL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1986 (TCC_EA_WRREQ_GMI_CREDIT_STALL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1987 (TCC_EA_WRREQ_DRAM_CREDIT_STALL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1988 (TCC_TOO_MANY_EA_WRREQS_STALL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1989 (TCC_EA_WRREQ_LEVEL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1990 (TCC_EA_RDREQ_LEVEL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1991 (TCC_EA_ATOMIC_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1992 (TCC_EA_ATOMIC_LEVEL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1993 (TCC_EA_RDREQ_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1994 (TCC_EA_RDREQ_32B_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1995 (TCC_EA_RD_UNCACHED_32B_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1996 (TCC_EA_RDREQ_IO_CREDIT_STALL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1997 (TCC_EA_RDREQ_GMI_CREDIT_STALL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1998 (TCC_EA_RDREQ_DRAM_CREDIT_STALL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 1999 (TCC_TAG_STALL_sum) expected 32 instances and got 1 [ERROR]Counter ID: 2000 (TCC_NORMAL_WRITEBACK_sum) expected 32 instances and got 1 [ERROR]Counter ID: 2001 (TCC_ALL_TC_OP_WB_WRITEBACK_sum) expected 32 instances and got 1 [ERROR]Counter ID: 2002 (TCC_NORMAL_EVICT_sum) expected 32 instances and got 1 [ERROR]Counter ID: 2003 (TCC_ALL_TC_OP_INV_EVICT_sum) expected 32 instances and got 1 [ERROR]Counter ID: 2004 (TCC_EA_RDREQ_DRAM_sum) expected 32 instances and got 1 [ERROR]Counter ID: 2005 (TCC_EA_WRREQ_DRAM_sum) expected 32 instances and got 1 [ERROR]Counter ID: 2006 (FETCH_SIZE) expected 32 instances and got 1 [ERROR]Counter ID: 2007 (WRITE_SIZE) expected 32 instances and got 1 [ERROR]Counter ID: 2008 (WRITE_REQ_32B) expected 32 instances and got 1 [ERROR]Counter ID: 2009 (CU_OCCUPANCY) expected 1 instances and got 8 Counter ID: 2010 (CU_UTILIZATION) expected 1 instances and got 1 [ERROR]Counter ID: 2011 (TOTAL_16_OPS) expected 1 instances and got 8 [ERROR]Counter ID: 2012 (TOTAL_32_OPS) expected 1 instances and got 8 [ERROR]Counter ID: 2013 (TOTAL_64_OPS) expected 1 instances and got 8 Counter ID: 2014 (AggSysCycles) expected 1 instances and got 1 Counter ID: 2015 (GpuUtil) expected 1 instances and got 1 Counter ID: 2016 (CpUtil) expected 1 instances and got 1 Counter ID: 2017 (SpiUtil) expected 1 instances and got 1 Counter ID: 2018 (TaUtil) expected 1 instances and got 1 Counter ID: 2019 (TcUtil) expected 1 instances and got 1 Counter ID: 2020 (EaUtil) expected 1 instances and got 1 [ERROR]Counter ID: 2021 (InstrFetchLatency) expected 1 instances and got 8 [ERROR]Counter ID: 2022 (WaveOccupancy) expected 1 instances and got 8 [ERROR]Counter ID: 2023 (WaveDuration) expected 1 instances and got 8 [ERROR]Counter ID: 2024 (WaveDepWait) expected 1 instances and got 8 [ERROR]Counter ID: 2025 (WaveIssueWait) expected 1 instances and got 8 [ERROR]Counter ID: 2026 (WaveExec) expected 1 instances and got 8 [ERROR]Counter ID: 2027 (ValuIops) expected 1 instances and got 8 [ERROR]Counter ID: 2028 (MfmaFlops) expected 1 instances and got 8 [ERROR]Counter ID: 2029 (MfmaFlopsF16) expected 1 instances and got 8 [ERROR]Counter ID: 2030 (MfmaFlopsBF16) expected 1 instances and got 8 [ERROR]Counter ID: 2031 (MfmaFlopsF32) expected 1 instances and got 8 [ERROR]Counter ID: 2032 (MfmaFlopsF64) expected 1 instances and got 8 [ERROR]Counter ID: 2033 (ScaPipeIssueUtil) expected 1 instances and got 8 [ERROR]Counter ID: 2034 (ValuPipeIssueUtil) expected 1 instances and got 8 [ERROR]Counter ID: 2035 (VmemPipeIssueUtil) expected 1 instances and got 8 [ERROR]Counter ID: 2036 (MfmaUtil) expected 1 instances and got 8 [ERROR]Counter ID: 2037 (AvgNumActiveThreads) expected 1 instances and got 8 [ERROR]Counter ID: 2038 (VmemLatency) expected 1 instances and got 8 [ERROR]Counter ID: 2039 (SmemLatency) expected 1 instances and got 8 [ERROR]Counter ID: 2040 (LdsUtil) expected 1 instances and got 8 [ERROR]Counter ID: 2041 (LdsPipeIssueUtil) expected 1 instances and got 8 [ERROR]Counter ID: 2042 (LdsLatency) expected 1 instances and got 8 [ERROR]Counter ID: 2043 (LdsBankConflict) expected 1 instances and got 8 [ERROR]Counter ID: 2044 (L1iCacheHitRate) expected 1 instances and got 8 [ERROR]Counter ID: 2045 (sL1dCacheHitRate) expected 1 instances and got 8 [ERROR]Counter ID: 2046 (vL1dBufCoalesceRate) expected 16 instances and got 1 [ERROR]Counter ID: 2047 (vL1dCacheUtil) expected 16 instances and got 1 [ERROR]Counter ID: 2048 (vL1dCacheTcbHitRate) expected 16 instances and got 1 [ERROR]Counter ID: 2049 (vL1dCacheWaveLatency) expected 16 instances and got 1 [ERROR]Counter ID: 2050 (vL1dReadFromL2Latency) expected 16 instances and got 1 [ERROR]Counter ID: 2051 (vL1dWriteToL2Latency) expected 16 instances and got 1 [ERROR]Counter ID: 2052 (vL1dRdTagConfStallRate) expected 16 instances and got 1 [ERROR]Counter ID: 2053 (vL1dWrTagConfStallRate) expected 16 instances and got 1 [ERROR]Counter ID: 2054 (vL1dAtomicTagConfStallRate) expected 16 instances and got 1 [ERROR]Counter ID: 2055 (vL1dMissReqStallRate) expected 16 instances and got 1 [ERROR]Counter ID: 2056 (vL1dDataPendRate) expected 16 instances and got 1 [ERROR]Counter ID: 2057 (vL1dDataRetStallRate) expected 16 instances and got 1 [ERROR]Counter ID: 2058 (L2CacheHitRate) expected 32 instances and got 1 [ERROR]Counter ID: 2059 (L2CacheTagRamStallRate) expected 32 instances and got 1 [ERROR]Counter ID: 2060 (EaRdLatency) expected 32 instances and got 1 [ERROR]Counter ID: 2061 (EaRdIoStallRate) expected 32 instances and got 1 [ERROR]Counter ID: 2062 (EaRdGmiStallRate) expected 32 instances and got 1 [ERROR]Counter ID: 2063 (EaRdDramStallRate) expected 32 instances and got 1 [ERROR]Counter ID: 2064 (EaWrLatency) expected 32 instances and got 1 [ERROR]Counter ID: 2065 (EaWrIoStallRate) expected 32 instances and got 1 [ERROR]Counter ID: 2066 (EaWrGmiStallRate) expected 32 instances and got 1 [ERROR]Counter ID: 2067 (EaWrDramStallRate) expected 32 instances and got 1 [ERROR]Counter ID: 2068 (EaWrStarveRate) expected 32 instances and got 1 [ERROR]Counter ID: 2069 (EaAtomicLatency) expected 32 instances and got 1 [ERROR]Counter ID: 2070 (TCP_TCP_TA_DATA_STALL_CYCLES_sum) expected 16 instances and got 1 [ERROR]Counter ID: 2071 (TCP_TCP_TA_DATA_STALL_CYCLES_max) expected 16 instances and got 1 [ERROR]Counter ID: 2072 (VFetchInsts) expected 16 instances and got 8 [ERROR]Counter ID: 2073 (VWriteInsts) expected 16 instances and got 8 [ERROR]Counter ID: 2074 (FlatVMemInsts) expected 1 instances and got 8 [ERROR]Counter ID: 2075 (LDSInsts) expected 1 instances and got 8 [ERROR]Counter ID: 2076 (FlatLDSInsts) expected 1 instances and got 8 [ERROR]Counter ID: 2077 (VALUUtilization) expected 1 instances and got 8 [ERROR]Counter ID: 2078 (VALUBusy) expected 1 instances and got 8 [ERROR]Counter ID: 2079 (SALUBusy) expected 1 instances and got 8 [ERROR]Counter ID: 2080 (FetchSize) expected 32 instances and got 1 [ERROR]Counter ID: 2081 (WriteSize) expected 32 instances and got 1 [ERROR]Counter ID: 2082 (MemWrites32B) expected 32 instances and got 1 [ERROR]Counter ID: 2083 (L2CacheHit) expected 32 instances and got 1 [ERROR]Counter ID: 2084 (MemUnitStalled) expected 16 instances and got 1 [ERROR]Counter ID: 2085 (WriteUnitStalled) expected 32 instances and got 1 [ERROR]Counter ID: 2086 (LDSBankConflict) expected 1 instances and got 8 * source formatting (clang-format v11) (#225) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * cmake formatting (cmake-format) (#224) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Minor fixes * source formatting (clang-format v11) (#226) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Minor test change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bwelton <bwelton@users.noreply.github.com> |
||
|
|
ca296ff22b |
Remove _service from rocprofiler_service_* types (#221)
- this is a continuation of #168 which removed _SERVICE from the ROCPROFILER_SERVICE_ enums |
||
|
|
cf5e4b4b1b |
Integration Testing (#211)
* Add external/cereal submodule - used for integration testing * Update lib/common/container/small_vector.hpp - documentation notes * Update tests/apps - update transpose app (fix build) - add reproducible-runtime app * Update include/rocprofiler/fwd.h - rocprofiler_service_callback_phase_t -> rocprofiler_callback_phase_t * Update PTL submodule - fix for task group: submitting tasks from different thread * Update lib/rocprofiler/hsa/queue.cpp - CHECK_NOTNULL(_buffer) * Update lib/rocprofiler/hsa/hsa.cpp - use buffer::get_buffer instead of manually looking for buffer * Update lib/rocprofiler/internal_threading.cpp - use buffer::get_buffer instead of manually looking for buffer * Update lib/rocprofiler/buffer.cpp - offset the buffer id - properly handle rocprofiler_create_buffer reusing rocprofiler_buffer_id_t on a different context * Update tests - kernel tracing library for integration testing * Add cereal submodule * Update lib/rocprofiler/registration.* - OnUnload - Support ROCP_TOOL_LIBRARIES for python usage - improve finalize function - remove calling hsa_shut_down in finalize function * Update lib/rocprofiler/buffer.* - allocate_buffer sets the buffer id value - expose (internally) is_valid_buffer_id - update test * Update tests/kernel-tracing - installation - better organization of JSON groups - improved messaging * Update lib/rocprofiler/registration.cpp - add workaround for hsa-runtime supporting rocprofiler-register * Update tests/kernel-tracing/kernel-tracing.cpp - fix memory leaks * cereal support for minimal JSON - update cereal submodule to rocprofiler branch - change REPO_BRANCH in rocprofiler_checkout_git_submodule for cereal - update tests/kernel-tracing/kernel-tracing.cpp - use minimal json - slight tweak putting giving contexts name in storing name + context pointer pair in map * Update tests/kernel-tracing/kernel-tracing.cpp - support runtime selection of contexts via KERNEL_TRACING_CONTEXTS environment variable * Update tests - tests/CMakeLists.txt - find_package(Python3 REQUIRED) - tests/kernel-tracing - pytest validation * Update CI workflow - install pytest - add checks for test labels * Update scripts/run-ci.py - change --coverage options - replace 'unittests' with 'tests' - replace test label regex '-L unittests' with '-L tests' * Update requirements.txt - this is now an empty file since none of the packages are required for this repo |
||
|
|
086218c2eb |
Fixes licensing in files (#206)
* Update LICENSE - fix inconsistencies * Revert lib/rocprofiler/counters/parser/scanner.cpp * Update lib/rocprofiler/counters/tests/dimension.cpp - revert ending curly brace * Revert missing curly braces - missing curly braces when file did not end with a new line |
||
|
|
3082288a25 |
Code object, kernel dispatch, and memory copy tracing (#177)
* Update samples/api_buffered_tracing
- external correlation id
- support ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH
* Update lib/rocprofiler/context.cpp
- update alternative get_active_contexts paradigm
* Update lib/rocprofiler/external_correlation.cpp
- inherit correlation id from main thread
* Update lib/rocprofiler/hsa/queue.*
- typedef changes
- rocprofiler_packet union
- modify Queue::queue_info_session_t
- use rocprofiler_packet
- add thread id
- add kernel id
- add correlation id
- out of line definitions
- AsyncSignalHandler function update
- handle kernel dispatch tracing
- Move CreateBarrierPacket and AddVendorSpecificPacket to lambdas
- handle contexts
* Update lib/rocprofiler/hsa/hsa.cpp
- remove unnecessary log function
- use new get_active_contexts paradigm
- use new correlation id updates
* Update AgentCache and kernel dispatch record
- include const rocprofiler_agent_t* in rocprofiler_buffer_tracing_kernel_dispatch_record_t
- AgentCache::get_rocp_agent returns const pointer
* Replace ROCPROFILER_SERVICE_ with ROCPROFILER_
* source formatting
* Code Object Tracing
- include/rocprofiler/callback_tracing.h
- remove rocprofiler_callback_tracing_code_object_unload_data_t
- remove rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t
- include/rocprofiler/fwd.h
- remove ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_UNLOAD
- remove ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_UNREGISTER
- lib/common/utility.hpp
- assert_public_api_struct_properties()
- init_public_api_struct(...)
- lib/rocprofiler/registration.cpp
- invoke hsa::code_object_init
- lib/rocprofiler/hsa/CMakeLists.txt
- compile code_object code
- lib/rocprofiler/hsa/code_object.{hpp,cpp}
- tracing code object load/unload
- lib/rocprofiler/hsa/queue.cpp
- get_kernel_id
* Update lib/rocprofiler/hsa/hsa.cpp
- fix should_wrap_functor logic (which was not handling callback_tracer + buffered_tracer properly)
* Update lib/rocprofiler/hsa/queue.cpp
- fix rocprofiler_buffer_tracing_kernel_dispatch_record_t construction
* Update samples/api_buffered_tracing/client.cpp
- print kernel names
* Move samples/apps to tests/apps
* Update lib/rocprofiler/hsa/code_object.cpp
- ensure unload callbacks when application is exiting
- support user data in between load/unload callbacks
* Update lib/rocprofiler/hsa/queue.{hpp,cpp}
- store contexts and external correlation ids in queue_info_session
- reduce signal_limiter to 96 to fix hangs
- fix support for kernel tracing and async memory copies
* Add lib/common/scope_destructor.hpp
- similar to static_cleanup_wrapper but different
* Update include/rocprofiler/buffer_tracing.h
- update rocprofiler_buffer_tracing_memory_copy_record_t
- remove operation: user can figure that out from correlation id
- add kernel id
- add rocprofiler agent id
* Update include/rocprofiler/callback_tracing.h
- fix data type of load_delta field in code object
- remove rocp_agent from kernel_symbol_register_data_t (known via code_object_id)
* Add samples/code_object_tracing
- sample demonstrating code object tracing
* Update samples
- minor tweak to print_call_stack
* Update lib/rocprofiler/hsa/code_object.cpp
- flip ordering of unload callbacks for code object unloading and kernel symbol deregistering
* clang-tidy fixes
* Update lib/rocprofiler/hsa/code_object.cpp
- fix heap-use-after-free issue with code object
* Update include/rocprofiler/external_correlation.h
- update documentation to include info about default value of external correlation value
* Use common::container::small_vector for contexts
- small_vector<const context*> is an ideal data structure for array of active contexts
* Update context handling for code object unload
- code object unload is only called for contexts which received the load callback
* Update samples
- improve ROCPROFILER_CALL macro to include status string
- api_buffered_tracing handles ROCPROFILER_STATUS_ERROR_BUFFER_BUSY
* Code object shutdown
- ensure code object callbacks are invoked prior to finalizing
* Update lib/common (memory allocators)
- added lib/common/memory folder with allocators
* Add lib/rocprofiler/allocator.*
- rocprofiler::allocator::static_data_allocator
- special allocator for static data which finalizes before any data gets destroyed
- rocprofiler::allocator::unique_static_ptr_t
- unique_ptr that uses static data deleter (ensure finalize is called)
* Update lib/rocprofiler/buffer.cpp
- flush checks fini status
- use unique_static_ptr_t
* Update lib/rocprofiler/internal_threading.*
- change meaning of thread_pool_t and task_group_t
- improve finalization to prevent data races and heap-use-after-free
* Update lib/rocprofiler/registration.cpp
- use static_data_allocator for client_library vector
* Update lib/rocprofiler/context/context.*
- use allocator::unique_static_ptr_t
* Update lib/rocprofiler/allocator.cpp
- avoid deadlock in deleter<static_data>::operator()
* Update lib/rocprofiler/registration.cpp
- avoid deadlock in rocprofiler::registration::finalize()
* Update lib/rocprofiler/hsa/code_object.cpp
- suppress duplicate reporting of code-object/kernel-symbol load/unload
* Update leak sanitizer suppressions
- __new_exitfn (via stdlib/cxa_atexit.c leaks
|
||
|
|
55f2dabbb3 |
Generalized updates (#174)
- include/rocprofiler/agent.h
- move rocprofiler_dim3_t
- include/rocprofiler/buffer_tracing.h
- size fields
- update kernel dispatch record
- include/rocprofiler/callback_tracing.h
- remove rocprofiler_callback_tracing_code_object_unload_data_t
- remove rocprofiler_callback_tracing_code_object_register_host_kernel_symbol_data_t
- include/rocprofiler/fwd.h
- added ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT
- remove ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_UNLOAD
- remove ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_UNREGISTER
- add rocprofiler_kernel_id_t typedef
- add rocprofiler_dim3_t (moved from agent.h)
- lib/common/synchronized.hpp
- rlock/wlock return decltype(auto)
- separate prototype from definition
- lib/common/utility.{hpp,cpp}
- timestamp functions replicating HSA implementation(s)
- init_public_api_struct for setting size field and ensuring certain type traits
- simplified static_cleanup_wrapper
- separate prototype from definition in active_capacity_gate
- lib/rocprofiler/agent.cpp
- tweak get_rocprofiler_agent impl
- lib/rocprofiler/buffer.cpp
- fix buffer message log level
- lib/rocprofiler/context.cpp
- use new paradigm for getting active contexts
- lib/rocprofiler/internal_threading.hpp
- update to simplified static_cleanup_wrapper implementation
- lib/rocprofiler/registration.cpp
- fix deactivating contexts
- lib/rocprofiler/rocprofiler.cpp
- status string for context conflict
- lib/rocprofiler/context/context.*
- correlation_id struct
- new get_active_contexts paradigm
- lib/rocprofiler/counters/core.*
- rocprofiler_packet union
- tweak start/stop context to accept pointer instead of handle
- lib/rocprofiler/counters/dimensions.cpp
- update to new get_rocp_agent() return type
- lib/rocprofiler/hsa/hsa.*
- update to new get_active_contexts paradigm
- update to new correlation id implementation
- guard against hsa.def.cpp direct compilation
- lib/rocprofiler/hsa/queue_controller.*
- update to change in get_rocp_agent return type
- consistent aliases
- lookup function for getting queue pointer from hsa queue id
- lib/rocprofiler/hsa/queue.*
- rocprofiler_packet
- extend queue_info_session_t
- lib/rocprofiler/tests/registration.cpp
- improve diagnostic on perf check for rocprofiler_lib.callback_registration_lambda_with_result
|
||
|
|
63775f241a |
Evaluation portion for metrics (#123)
* EvaluateAST and validation of RawAST * Adding MetricDimension class and concepts * set_dimensions() and improved ValidateRawAST() * source formatting (clang-format v11) (#124) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Addressing 1st round of review comments * Modified the parser production rules to support the right syntax for REDUCE and SELECT derived metric expressions * changes to raw_ast.hpp and fmt::format() * Parser tests updated to support corrected REDUCE and SELECT syntax * changes to EvaluateAST::set_dimensions() and other dimension related code changes * Added a test for EvaluateAST::evaluate() to test basic arithmetic on EvaluateAST * Format source code (via clang-format v11) on sauverma/evaluate-ast (#146) * source formatting (clang-format v11) * Add dimension information to counter record Restructures counter records to have the following design: rocprofiler_record_id_t which is an int64_t that encodes both the counter id and dimension information for the record. The first 16 bits are reserved for the counter id while the last 48 are split among the dimensions specified in rocprofiler_dimension_t (currently 8 bits per dimension). Each of the 8 bits for the dimension stores the dimension value for that dimension for this record (i.e. a value of 8 on dimension XCC would denote XCC[8] for the counter). The split among the dimensions will automatically adjust as dimensions are added or removed. The record also contains a union of {int64_t hw_counter, double derived_counter} to specify the value of the record at rocprofiler_record_id_t. int64_t denotes a physical hardware counter that has integer types while the double is used for derived counters (which type this counters values are needs to be queried separately). * Integration of new id type + other fixes --------- Co-authored-by: sauverma93 <sauverma93@users.noreply.github.com> Co-authored-by: Benjamin Welton <bewelton@amd.com> * Fixed sissues with reduce() implementation and added a test for reduce() * Updated parser syntax for reduce() and updated the parser test. Disabled the test for select() * Build warning fixes * Modifications to support fetching xcc/etc info from agent * Initial plumbing working for single counters, cleanup+tests still needed * Remove string comparison from reduce ops * source formatting (clang-format v11) (#163) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * cmake formatting (cmake-format) (#164) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * source formatting (clang-format v11) (#171) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Merged with master * source formatting (clang-format v11) (#172) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * source formatting (clang-format v11) (#173) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Test fix --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bwelton <bwelton@users.noreply.github.com> Co-authored-by: sauverma93 <sauverma93@users.noreply.github.com> Co-authored-by: Benjamin Welton <bewelton@amd.com> |
||
|
|
4f2dc896d3 |
Support Tool Intercept API Tables (#165)
* Update include/rocprofiler - intercept_table.h header - generic rocprofiler_runtime_library_t - rocprofiler_internal_thread_library_t is not typedef for rocprofiler_runtime_library_t - rocprofiler_at_runtime_api_registration * Update lib/rocprofiler - minor tweaks to context.cpp - check if none context early - disallow stop_context when finalizing - add intercept_table.hpp and intercept_table.cpp - implement rocprofiler_at_runtime_api_registration - implement notify_runtime_api_registration - update registration.cpp - invoke notify_runtime_api_registration - tweak to fini status when invoking client finalizer * Update lib/rocprofiler/tests - add tests for intercept table * Add samples/intercept_table - demonstrate how to install custom API function wrappers instead of relying on HSA callback tracing * Update lib/rocprofiler/tests/intercept_table.cpp - remove _SERVICE from ROCPROFILER_SERVICE_ * Update include/rocprofiler/intercept_table.h - Update doxygen comments * Update lib/rocprofiler/intercept_table.cpp - return error config locked if already initialized * Update lib/rocprofiler/intercept_table.cpp - remove unnecessary alias * Apply suggestions from code review Co-authored-by: Tony Tye <Tony.Tye@amd.com> * Update doxygen comments - clarify when rocprofiler_at_runtime_api_registration can be invoked * Use rocprofiler_runtime_library_t for intercept table and internal threading - remove rocprofiler_intercept_library_t alias to rocprofiler_runtime_library_t - remove rocprofiler_internal_thread_library_t alias to rocprofiler_runtime_library_t - move around documentation with regard to rocprofiler_runtime_library_t enumeration - added some extra doxygen documentation to internal threading functions --------- Co-authored-by: Tony Tye <Tony.Tye@amd.com> |
||
|
|
14373c57be |
Doxygen Improvements (#170)
* Doxygen updates - Fix multiple @param where [in]/[out] was misplaced - Fix @return - Insert @retval - Separate out installing conda environment from build docs step |
||
|
|
033fd941e0 |
Remove SERVICE_ from ROCPROFILER_SERVICE_* enum vals (#168)
- these are unnecessary and are inconsistent with the name convention of everything else related to callback tracing |
||
|
|
cfbea0e5eb |
Update include/rocprofiler and lib/rocprofiler (#166)
- renamed inconsistent callback tracing types
- updated HIP and Marker API data structures (resemble HSA)
- cleaned up api_args.h and api_id.h headers
- cleaned up hsa.h, hip.h, and marker.h headers
- update to use (more consistent) name changes
- update code object data structs
- ROCPROFILER_SERVICE_CALLBACK_PHASE_{LOAD,UNLOAD} equivalent to ENTER, EXIT respectively
|
||
|
|
7f631de401 |
Separate agent cache from queue controller (#145)
* Update lib/rocprofiler/agent.{hpp,cpp}
- get_agents() function for internal access to agent pointers
* Update AgentCache
- make member variables and member functions distinguish b/t hsa agent and rocprofiler agent clear
* Change ctor of AgentCache
* Update lib/rocprofiler/hsa/queue_controller.cpp
- QueueController::init uses agent::get_agent_cache
* Update lib/rocprofiler/hsa/agent_cache.*
- member function to get index
- operator== for rocprofiler_agent_t and hsa_agent_t
- removed hsa_iterate_agents from ctor (now in agent.cpp)
* Update lib/rocprofiler/agent.*
- construct_agent_cache function
- functions for rocprofiler agent <-> HSA agent
- functions for getting agent cache
* Update lib/rocprofiler/registration.cpp
- invoke construct_agent_cache when HSA table is receieved
* Update lib/rocprofiler/agent.cpp
- loosen failure conditions
- handle spurious duplicate entry warning
* Update lib/rocprofiler/agent.cpp
- improve read_map diagnostics
* Update lib/rocprofiler/agent.cpp
- avoid infinite loop in read_map
* Update lib/rocprofiler/agent.cpp
- handle empty kfd node properties file
* Update lib/rocprofiler/agent.cpp
- check for permissions to read a node properties file
* Update lib/rocprofiler/agent.cpp
- more checks on file readability
* Update lib/rocprofiler/tests/agent.cpp
- print virtual kfd topology
* Update lib/rocprofiler/tests/agent.cpp
- verify id.handle == hsa_agent internal node id
* Update lib/rocprofiler/tests/agent.cpp
- check node_id
- check location id
- check device id
- update abi test
* Update include/rocprofiler/agent.h
- add node_id field
- add reserved0 field to ensure new field increases struct size
* Update lib/rocprofiler/agent.cpp
- node_id instead of id.handle
* Update lib/rocprofiler/agent_cache.cpp
- node_id instead of id.handle
* Update samples/pc_sampling
- node_id for agent instead of id.handle
* Update lib/rocprofiler/buffer.cpp
- remove debug prints
|
||
|
|
87cc748c3d |
Query callback and buffered tracing names (#135)
* Update include/rocprofiler/buffer_tracing.h - add query functions for kind name, and kind operation name - tweak iterate functions to not be specifically dedicated to names * Update include/rocprofiler/callback_tracing.h - add query functions for kind name, and kind operation name - tweak iterate functions to not be specifically dedicated to names * Update lib/rocprofiler/callback_tracing.cpp - implement rocprofiler_query_callback_tracing_kind_name - implement rocprofiler_query_callback_tracing_kind_name_buf - implement rocprofiler_query_callback_tracing_kind_operation_name - implement rocprofiler_query_callback_tracing_kind_operation_name_buf - implement rocprofiler_iterate_callback_tracing_kinds - implement rocprofiler_iterate_callback_tracing_kind_operations * Update lib/rocprofiler/buffer_tracing.cpp - implement rocprofiler_query_buffer_tracing_kind_name - implement rocprofiler_query_buffer_tracing_kind_name_buf - implement rocprofiler_query_buffer_tracing_kind_operation_name - implement rocprofiler_query_buffer_tracing_kind_operation_name_buf - implement rocprofiler_iterate_buffer_tracing_kinds - implement rocprofiler_iterate_buffer_tracing_kind_operations * Update lib/rocprofiler/tests/registration.cpp - use new implementation for getting callback/buffer tracing names * Update samples/api_buffered_tracing - use new implementation for getting callback/buffer tracing names * Update samples/api_callback_tracing - use new implementation for getting callback/buffer tracing names * Remove buffered query functions - *_buf variants of the rocprofiler_query_X_tracing_Y functions were removed since we currently have no names requiring these functions * Rename ROCPROFILER_STATUS_ERROR_DOMAIN_NOT_FOUND - "DOMAIN" changed to "KIND" since former is more specific tracing whereas kind is used more generically |
||
|
|
6a3f79e626 |
Update correlation id definition + status strings + const active contexts (#127)
* Update include/rocprofiler
- remove rocprofiler_external_correlation_id_t
- redefine rocprofiler_correlation_id_t to include internal id and external user data
- associate rocprofiler_push_external_correlation_id and rocprofiler_pop_external_correlation_id with a context
* Update include/rocprofiler/rocprofiler.h
- rocprofiler_get_status_name
- rocprofiler_get_status_string
* Update lib/rocprofiler/rocprofiler.cpp
- implement rocprofiler_get_status_name and rocprofiler_get_status_string
* Update lib/rocprofiler/tests/status.cpp
- unit test for status string and name
* Update lib/rocprofiler/tests/registration.cpp
- update to new rocprofiler_correlation_id_t
* Update samples
- update to new rocprofiler_correlation_id_t
* Add lib/rocprofiler/external_correlation.cpp
- placeholder for external correlation push/pop
* Update lib/rocprofiler/hsa/agent_cache.cpp
- slight tweak to when HSA_AMD_AGENT_INFO_NEAREST_CPU is defined
* Update context implementation and hsa.cpp
- get_active_contexts is array of const context pointers
- update hsa_api_impl<Idx>::functor to new rocprofiler_correlation_id_t
* Update include/rocprofiler/fwd.h
- add ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT
- reorder enum for consistency
* Update include/rocprofiler/external_correlation.h
- doxygen comments
- thread id parameter
* Update include/rocprofiler/rocprofiler.h
- add rocprofiler_get_thread_id function (needed for external corr id)
* Update lib/common/synchronized.hpp
- explicit LockedType
- define all copy/move ctor and assignment
- update rlock/wlock/ulock to support arguments and return values
- Support additional template parameter for special case of synchronized instance which is the mapped type of a sychronized map
* Update lib/rocprofiler/external_correlation.cpp
- implement rocprofiler_{push,pop}_external_correlation_id
* Update lib/rocprofiler/CMakeLists.txt
- external_correlation.hpp
* Update lib/rocprofiler/rocprofiler.cpp
- status string for ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT
- implement rocprofiler_get_thread_id
* Update lib/rocprofiler/tests (external correlation)
- add external_correlation unit tests
* Update include/rocprofiler/callback_tracing.h
- doxygen comments
- callback invoked in callback tracing has user_data pointer passed to it
* Update samples/api_callback_tracing/client.cpp
- add rocprofiler_user_data_t* to tool_tracing_callback
* Update lib/rocprofiler/tests/registration.cpp
- add rocprofiler_user_data_t* to tool_tracing_callback
* Update lib/rocprofiler/context/context.{hpp,cpp}
- update correlation_tracing_service
- external_correlation instance
- rename get_unique_record_id to get_unique_internal_id
* Update lib/tests/common/demangling.cpp
- tweak mangled definitions due to changing function get_unique_record_id to get_unique_internal_id
* Update lib/rocprofiler/hsa/hsa.cpp
- handle updates to external correlation id
- handle updates to callback signature in callback tracing
* Update CMakeLists.txt
- CMAKE_BUILD_TYPE=Coverage defines CODECOV=1
* Update samples/api_callback_tracing/client.cpp
|
||
|
|
d1518c65b2 |
Miscellaneous Updates (const-correctness, logic fixes, etc.) (#126)
* Update lib/rocprofiler/hsa/hsa.cpp
- fix logic for constructing callback_contexts and buffered_contexts arrays
* Update include/rocprofiler/{agent,fwd,pc_sampling}.h
- remove rocprofiler_pc_sampling_config_array_t due to const problems
- update rocprofiler_agent_t to use arrays to const data
- remove redundant rocprofiler_query_pc_sampling_agent_configurations
- this implementation is quite literally looking up info in the agent struct that was passed
* Update lib/rocprofiler/pc_sampling.cpp
- remove rocprofiler_query_pc_sampling_agent_configurations
* update lib/rocprofiler/agent.cpp
- handle const fields
- make mi200_pc_sampling_config variable static
* Update lib/rocprofiler/tests/agent.cpp
- tweak to pc_sampling_configs offset
* Update samples/pc_sampling
- Update sample to reflect minor tweaks to pc_sampling_configs in rocprofiler_agent_t
* Update CI workflow
- remove 'if: ${{ always() }}'
- I suspect this is why the jobs do not cancel in progress correctly
|
||
|
|
010693b795 |
Agent, Counters, and AQL (#55)
* Migrate XML counter defs and reader from v1/v2 * Current Working Set * Modified parser * Evaluate AST Start * Update lib/common/xml - move definitions out of class declaration * Update lib/rocprofiler/counters/parser - update build of bison and flex build - reproducible generation - add ROCPROFILER_REGENERATE_COUNTERS_PARSER option - fix namespacing * Update lib/rocprofiler/counters/xml - change location of XML files and install them * Update lib/rocprofiler/counter/tests - normalize the test names - improve test failures (more clear about where failure is) * Update lib/rocprofiler/counters - fix namespace - update to new XML metrics directory * Update lib/rocprofiler/CMakeLists.txt - link to object library * Update lib/rocprofiler/hsa/types.hpp - reorganize includes * Add metric loading class/printers * Agent Implementation * Queue Implementation (#79) * Queue Implementation * API Implementation For Counters (part 1) (#80) * API Implementation For Counters * Bewelton/counter collection 3 (#84) * Added counter sample * More changes * More changes * Update samples/counter_collection - mostly formatting * Update include/rocprofiler/counters.h - formatting * Add lib.common/synchronized.hpp - Synchronized struct * Update lib/rocprofiler/counters/xml/basic_counters.xml - whitespace * Update scripts/patch-parser.cmake - tweaks for consistency * Update lib/rocprofiler/counters/parser/tests/parser_tests.cpp - formatting * Update lib/rocprofiler/counters/parser - improve consistency in rocprofiler-expr-parser-patch - update parser.{h,cpp} and scanner.cpp - formatting + regenerated * Update lib/rocprofiler/aql - formatting - clang-tidy fixes - guard against memory pool access errors * Update lib/rocprofiler/aql/tests - formatting - update use of get_val - normalize test names * Update lib/rocprofiler/counters/tests - formatting - patch basic_counters and derived_counters - normalize test names * Update lib/rocprofiler/aql/tests - set_tests_properties * Update test labels - fix minor issue with gtest labels * Update lib/rocprofiler/counters - formatting - clang-tidy fixes * Update lib/rocprofiler/hsa - fix includes - formatting - clang-tidy fixes - tweak to queue_controller_init interface * Update lib/rocprofiler - include fixes - namespace fixes - clang-tidy fixes - formatting * Update scripts/run-ci.py - exclude counters/parser from code coverage (generated files) * Update include/rocprofiler/counters.h - fix doxygen comment * Update lib/rocprofiler/aql/packet_construct.cpp - guard against HSA_AMD_MEMORY_POOL_ACCESS_DISALLOWED_BY_DEFAULT and HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED * Update lib/rocprofiler/counters/parser/raw_ast.hpp - clang-tidy fixes * Update lib/rocprofiler/counters/evaluate_ast.hpp - clang-tidy fixes * Update lib/rocprofiler/aql/tests - disable packet_generation_single and packet_generation_multi tests - the entire implementation rocprofiler::get_ext_table() is incorrect * Minor fixes before cleanup * More changes * More fixes * More fixes * source formatting (clang-format v11) (#99) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Revert PTL submodule * Update scripts/run-ci.py - exclude counters/parser from code coverage (generated files) * Migrating counters state to context * Linting * source formatting (clang-format v11) (#101) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * revert run-ci * Testing fixes * More test changes * Fix minor typo * Small queue change * Small queue change * source formatting (clang-format v11) (#102) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * source formatting (clang-format v11) (#105) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Documentation Change * More documentation fixes * source formatting (clang-format v11) (#106) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Threading fixes * Threading fixes * source formatting (clang-format v11) (#107) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Threading fixes * More test fixes * More agent fixes * More build fixes * source formatting (clang-format v11) (#109) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * changed test timeouts * Build fix * Build fix * Updates to agent * source formatting (clang-format v11) (#114) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * cmake formatting (cmake-format) (#113) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * remove git worktree folder * Doc update * testing fix * Another test fix * More test changes * Rebase * source formatting (clang-format v11) (#116) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Documentation * source formatting (clang-format v11) (#119) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * PTL Changes * Minor agent fix for empty labels * source formatting (clang-format v11) (#120) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Minor agent fix for empty labels * Refactor read_map * source formatting (clang-format v11) (#121) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Refactor read_map * Cache fixes * source formatting (clang-format v11) (#122) Co-authored-by: bwelton <bwelton@users.noreply.github.com> --------- Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bwelton <bwelton@users.noreply.github.com> |
||
|
|
a798a26227 |
Agent information w/o using hsa-runtime (#100)
* Agent information w/o using hsa-runtime
- remove lib/rocprofiler/hsa/agent.{hpp,cpp}
- update include/rocprofiler/agent.h
- basically all possible info from /sys/class/kfd/kfd/topology/nodes/*
* Print topology in rocprofiler_lib.agent test
- hack to help diagnose errors
* Update lib/rocprofiler/tests/details/agent.cpp
- use LOG_IF(WARNING, ...) instead of LOG_IF(FATAL, ...)
* Update lib/rocprofiler/tests/agent.cpp
- print rocminfo at beginning of test
- fix comparison of agent handle
- misc other checks
* Updte lib/rocprofiler/agent.cpp
- handle unreadable /sys/class/kfd/kfd/topology/nodes/<N>/properties file
* Update lib/tests/buffering/CMakeLists.txt
- increase timeout to 120
- buffering.parallel will timeout when thread sanitizing is enabled
* Update cmake: rocprofiler-drm
- find drm headers and libraries
* Update include/rocprofiler/agent.h
- add family_id field
* Update lib/rocprofiler/agent.cpp
- parse /proc/cpuinfo for name, family, apicid, etc.
- read_topology uses unique pointers to cleanup memory allocations
- implement name and gfxip
* Update lib/rocprofiler/tests/agent.cpp
- improved failure message
- check name/gfxip
- remove check against hsa_agent_t.handle
- this value is dependent on the address of C++ class
* Update lib/rocprofiler/tests/details/agent.cpp
- tweak gfxip_ variable which is broken for CPU
* Update lib/rocprofiler/agent.cpp
- update string handling for name and gfxip
* Update lib/rocprofiler/tests/agent.cpp
- minor output tweak
* Update lib/rocprofiler/registration.{hpp,cpp}
- registration::init_logging() function
* Update lib/rocprofiler/agent.cpp
- fix hex handling of GFX step version
* Update lib/rocprofiler/tests/details/agent.cpp
- fix format string when nearest CPUs not found
* Update lib/rocprofiler/tests/CMakeLists.txt
- exclude details/agent.cpp from being parsed for gtest tests
* Update include/rocprofiler/fwd.h
- add ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_ABI status
* Update lib/rocprofiler/tests/details/agent.{hpp,cpp}
- replace with slightly modified implementation of rocminfo
- primary change was not printing
* Update lib/rocprofiler/tests/agent.cpp
- update test to use rocminfo data
* Update lib/rocprofiler/agent.cpp
- add pc_sampling_configs
- return error on incompatible ABI
* Update counters and counters tests
- rename test names for consistency
- fixed incorrect spelling of derived
* Add lib/rocprofiler/tests/{timestamp,version}.cpp
- add timestamp and version tests for rocprofiler_get_timestamp and rocprofiler_get_version, respectively
* Update lib/rocprofiler/tests/agent
- fix double free of name_str from isa_info_t
* Update include/rocprofiler/agent.h
- comments for rocprofiler_agent_mem_bank_t
- add rocprofiler_dim3_t
- comments for rocprofiler_agent_t
- add new fields to rocprofiler_agent_t
- cu_count
- workgroup_max_size
- workgroup_max_dim
- grid_max_size
- grid_max_dim
- vendor_name
- product_name
- change prototype of rocprofiler_available_agents_cb_t to be const agent**
* Update lib/rocprofiler/agent.cpp
- set size field
- implement:
- product_name
- vendor_name
- workgroup_max_size
- workgroup_max_dim
- grid_max_size
- grid_max_dim
- cu_count
* Update lib/rocprofiler/tests/agent.cpp
- changes for const agent*
* Update samples/pc_sampling
- updates for const agent*
* Update lib/rocprofiler/agent.cpp
- fix ABI compatibility check
- return incompatible if tool agent is larger than our agent
* Update include/rocprofiler/agent.h
- doxygen comments
- make size field of rocprofiler_agent_t uint64_t for consistency
- add gpu_id via /sys/class/kfd/kfd/.../<idx>/gpu_id
- add model_name via /sys/class/kfd/kfd/.../<idx>/name
* Update lib/rocprofiler/agent.cpp
- add read_file function (vector of strings)
- support enum in read_property
- assign model_name and gpu_id fields
- remove unique_id
* Update lib/rocprofiler/tests/details/agent.*
- support family id, ucode_version, sdma_ucode_version
* Update lib/rocprofiler/tests/agent.cpp
- Add rocprofiler_lib.agent_abi test
- Verify family_id, ucode_version, sdma_ucode_version
|
||
|
|
a646c1546c |
rocprofiler library unit tests (#81)
* Update CI and linting workflows
- delete linting workflow
- compile default CI job with clang-tidy
- split out code coverage matrix entry to separate job
- code coverage job runs code coverage 3x
- once for total code coverage
- once for unittests code coverage
- once for samples code coverage
* Update PTL submodule
- improves handling of when thread pool is destroyed in atexit handler
* Update lib/rocprofiler/buffer
- buffer::instance::get_internal_buffer()
- allocate_buffer invokes internal_threading::initialize() on first entry
- update flush routine
- if wait is false, does not wait for task group to finish syncing
- checks for callback pointer
* Update lib/rocprofiler/internal_threading
- modifications to handle destruction of statics before atexit handler is invoked
* Update lib/rocprofiler/registration.cpp
- reorder atexit call in initialize()
- protect finalize from executing more than once
* Add unittests for rocprofiler buffer
* Update CI workflow
- disable fail-fast for sanitizers
- move AddressSanitizer job to top of the list
* Update lib/rocprofiler/tests/buffer/CMakeLists.txt
- do not set memcheck LD_PRELOAD for rocprofiler-lib-buffer-tests
* Update lib/rocprofiler/registration.{hpp,cpp}
- only invoke client finalizers if initialized
- remove invoke_client_initializer
- move invoke_client functions to anonymous namespace (no declaration in header)
- set fini status in finalize
* Update scripts/thread-sanitizer-suppr.txt
- suppress false positive for double mutex lock in external/ptl/source/PTL/TaskGroup.hh
* Restructure lib/rocprofiler/tests
* Update lib/common
- add utility.cpp
- move read_command_line to utility.{hpp,cpp}
- was formerly in config.cpp
* Update lib/rocprofiler
- checks for init status return configuration locked if status is not greater than -1
- in other words, this prevents calling these functions directly (which was possible when check was for greater than 0
* Update lib/rocprofiler/context/context.{hpp,cpp}
- provide deactivate_client_contexts and deregister_client_contexts
- these functions are used when the tool fails to configure
* Update lib/rocprofiler/registration.{hpp,cpp}
- internal "public" get_client_offet()
- client ids are offset by a random value to avoid default values behaving correctly
* Update lib/rocprofiler/tests
- fix rocprofiler_lib.registration_lambda_no_result
* Update lib/rocprofiler/tests
- fix rocprofiler_lib.registration_lambda_with_result
* Update lib/rocprofiler/tests
- remove deep bind from rocprofiler_lib.registration_lambda_with_result
* Update lib/rocprofiler/tests
- use RTLD_NOW when dlopen'ing in rocprofiler_lib.registration_lambda_with_result
* Update rocprofiler registration tests
- split registration tests into separate exe that links to shared library
* Formatting
* Update CI workflow
- always checkout submodules via actions/checkout
* Update lib/rocprofiler/buffer.{hpp,cpp}
- fix issue with buffer flushing not working when only called once
* Update rocprofiler lib registration test
- test for buffered callback
* Update include/rocprofiler/rocprofiler.h
- include internal_threading.h header
* Update rocprofiler lib registration test
- add in internal threading for buffered test
|
||
|
|
8be4ca1a04 |
Fix rocprofiler installation (#73)
- install rocprofiler library - define AMD_INTERNAL_BUILD when including hsa/hsa.h - install include/rocprofiler/registration.h header - fix samples/pc_sampling cmake via installed rocprofiler - fix samples/api_callback_tracing cmake via installed rocprofiler - fix samples/api_buffered_tracing cmake via installed rocprofiler - set cmake_minimum_required in samples/CMakeLists.txt - find dependent packages in rocprofiler-config.cmake.in - AMDDeviceLibs - amd_comgr - hsa-runtime64 - hip - export rocprofiler-hip and rocprofiler-hsa-runtime libraries - add Test Install Build step to CI workflow |
||
|
|
5e4e7b41f1 |
Documentation, sanitizers, and PTL submodule (#71)
* Update scripts/thread-sanitizer-suppr.txt
- ignore data race occasionally triggered by libamdhip64.so
* Update external/CMakeLists.txt
- configure PTL to use locks in task queues
* Update PTL submodule
- tweal to task queues to prevent data race from std::list next pointer
* Add scripts/setup-sanitizer-env.sh
- bash script that exports the {ASAN,LSAN,TSAN}_OPTIONS used by run-ci.py
* Update include/rocprofiler (doxygen)
- fix doxygen grouping
* Update docs workflow
- change concurrency group to be specific to workflow + ref
- this prevents separate PRs triggering this workflow from cancelling each other
|
||
|
|
d3eaacd610 |
Contexts, tracing, include reorg, registration, thread-pool (#65)
* Update scripts/update-doxygen.sh
- ensure build-docs folder exists
* Update scripts/run-ci.py
- exclude files in details subdirectory from code coverage
* Update scripts/thread-sanitizer-suppr.txt
- exclude races in glog
* Update docs/rocprofiler.dox.in
- exclude defines in include/rocprofiler/defines.h from doxygen
- Tweak EXCLUDE_PATTERNS and EXAMPLE_PATTERNS
* Update docs workflow
- trigger workflow whenever there is a change to the public headers (which may be doxygen comments)
* Update include/rocprofiler (reorg and overhaul)
- rocprofiler_status_t additions
- CONTEXT_NOT_FOUND
- CONTEXT_ERROR
- INVALID_CONTEXT_ID
- INVALID_CONTEXT
- BUFFER_BUSY
- rocprofiler_context_is_active func
- rocprofiler_context_is_valid func
- rocprofiler_service_callback_tracing_kind_t update
- remove ROCPROFILER_SERVICE_CALLBACK_TRACING_HELPER_THREAD
- Remove rocprofiler_tracing_helper_thread_operation_t
- Remove rocprofiler_helper_thread_callback_tracer_data_t
- Added rocprofiler_internal_thread_library_t
- Added rocprofiler_at_internal_thread_create
- split rocprofiler.h into several smaller headers
- reworked rocprofiler_status_t values
- added doxygen comments for enums
- replaced rocprofiler_trace_record_operation_kind_t with rocprofiler_trace_operation_t
- use @ instead of / in doxygen comment in rocprofiler_plugin.h
- fix ref to ROCPROFILER_SERVICE_CALLBACK_TRACING_MARKER_API
- end group in fwd.h
- remove PROFILE_COUNTING group in dispatch_profile.h
- remove premature group close in callback_tracing.h
- hsa.h: remove rocprofiler_hsa_trace_data_t
- fwd.h: remove rocprofiler_tracer_callback_data_t
- rename rocprofiler_correlation_id_t.handle to rocprofiler_correlation_id_t.id (consistency)
- fwd.h: add rocprofiler_callback_tracing_record_t
- callback_tracing.h: update rocprofiler_hsa_api_callback_tracer_data_t
- callback_tracing.h: add size fields
- simplify rocprofiler_tracer_callback_t
- removed ROCPROFILER_NONNULL from rocprofiler_get_version
- added rocprofiler_get_timestamp
- ROCPROFILER_STATUS_ERROR_CONFIGURATION_LOCKED in rocprofiler_status_t
- add ROCPROFILER_STATUS_ERROR_THREAD_NOT_FOUND rocprofiler_status_t
- add rocprofiler_buffer_category_t
- rocprofiler_trace_operation_t -> rocprofiler_tracing_operation_t
- rocprofiler_user_data_t union
- tweak rocprofiler_callback_tracing_record_t
- make external_correlation_id non-pointer
- add rocprofiler_user_data_t data field
- tweak rocprofiler_record_header_t
- instead of single uint64_t kind field, have union for category + kind (two u32) with u64 hash
- API extensions for kind id <-> kind string
- API extensions for operation id <-> operation string
- rocprofiler_callback_trace_kind_name_cb_t
- rocprofiler_callback_trace_operation_name_cb_t
- rocprofiler_iterate_callback_trace_kind_names
- rocprofiler_iterate_callback_trace_kind_operation_names
- modify rocprofiler_hsa_api_callback_tracer_data_t data members (remove pointers)
- add rocprofiler_callback_trace_operation_args_cb_t function pointer typedef
- add rocprofiler_iterate_callback_trace_operation_args function
- fixed inconsistent use of *_trace_* vs. *_tracing_* (opting for tracing)
- removed rocprofiler_query_callback_trace_kind_name
- removed rocprofiler_query_callback_kind_operation_name
- Add include/rocprofiler/registration.h
- header dedicated to registering a tool/client with rocprofiler
- this header is not intended to be included by rocprofiler.h
- rocprofiler_client_id_t
- identifier for client tool
- rocprofiler_client_finalize_t
- function pointer prototype for tool-initiated finalization
- rocprofiler_tool_initialize_t
- function pointer prototype for tool initialization (i.e. configuration)
- rocprofiler_tool_finalize_t
- function pointer prototype for tool finalization
- rocprofiler_tool_configure_result_t
- struct returned by tool/client to rocprofiler
- rocprofiler_is_initialized
- function for querying whether tool-induced initialization is possible
- rocprofiler_is_finalized
- function for querying whether rocprofiler has been finalized
- rocprofiler_configure prototype
- this is the function tools implement
- prototype is always marked as having default visibility
- no implementation in rocprofiler
- added typedef for rocprofiler_configure function pointer
- added rocprofiler_force_configure to explicitly invoke rocprofiler_configure instead of relying on lazy init
- made callback typedef names more consistent (_cb_t suffix)
- typedef for rocprofiler_internal_thread_library_cb_t function pointer
- added rocprofiler_at_internal_thread_create function
- added rocprofiler_callback_thread_t struct
- added rocprofiler_create_callback_thread function
- added rocprofiler_assign_callback_thread function
- removed rocprofiler_buffer_tracing_record_header_t in favor of kind and correlation id in each record type
- added rocprofiler_buffer_tracing_kind_name_cb_t typedef
- added rocprofiler_buffer_tracing_operation_name_cb_t typedef
- added rocprofiler_iterate_buffer_tracing_kind_names function
- added rocprofiler_iterate_buffer_tracing_kind_operation_names function
- removed rocprofiler_query_buffer_trace_kind_name function
- removed rocprofiler_query_buffer_kind_operation_name function
* Update lib/common/container/stable_vector.hpp
- include limits header
- reserve_size struct
- overload stable_vector constructor to support reserving as part of construction
* Update lib/common/container/record_header_buffer.{hpp,cpp}
- add emplace member function accepting category and kind (two u32 variables) instead of one u64 kind
- use std::shared_mutex to prevent data-race when reading m_headers
- record_header_buffer is now multiple writer, single reader
- add read_lock member function (shared)
- add read_unlock member function (shared)
- lock member function gets exclusive lock
- unlock member function releases exclusive lock
* Rename "config" to "context" + restructure + implement
- Restructure config files + license
- move config files into lib/rocprofiler/config subfolder
- rename some files
- add license to some files which were missing it
- Rename config/helpers.hpp
- rename to allocator.hpp
- remove get_domain_max_ops
- Create config/domain.{hpp,cpp}
- structures for handling tracing domains and ops
- Update config/config.{hpp,cpp}
- buffer_instance struct
- callback_tracing_service struct
- buffer_tracing_service struct
- config struct
- allocate_{config,buffer} func
- {validate,start,stop}_config funcs
- get_registered_configs func
- get_active_configs func
- get_buffers func
- Update rocprofiler.cpp
- Implement rocprofiler_create_context
- Implement rocprofiler_start_context
- Implement rocprofiler_stop_context
- Implement rocprofiler_context_is_active
- Implement rocprofiler_context_is_valid
- Implement rocprofiler_flush_buffer
- Implement rocprofiler_destroy_buffer
- Implement rocprofiler_create_buffer
- Update lib/rocprofiler/hsa
- use rocprofiler_tracer_activity_domain_t instead of rocprofiler_tracer_activity_domain_t
- remove ROCPROFILER_TRACER_ACTIVITY_DOMAIN_HSA_API fromHSA_API_INFO_DEFINITION_* macros
- Update lib/rocprofiler/context/domain.*
- fixes for domain_info (i.e. use correct enums)
- update rocprofiler_status_t codes
- fix template instantiations
- Update lib/rocprofiler/context/context.*
- use rocprofiler_service_callback_tracing_kind_t instead of rocprofiler_tracer_activity_domain_t
- rename correlation_context to correlation_tracing_service
- fix domains in callback_tracing_service and buffer_tracing_service
- unique_ptr for callback_tracer and buffered_tracer in context
- Update lib/rocprofiler/rocprofiler.cpp
- implement rocprofiler_configure_callback_tracing_service
- Update lib/rocprofiler/hsa/ostream.hpp
- include rocprofiler.h instead of tracer.hpp
- Update lib/rocprofiler/hsa
- migration to use rocprofiler_hsa_api_callback_tracer_data_t instead of rocprofiler_hsa_trace_data_t
- restructure hsa_api_impl<Idx>
- remove phase_enter and phase_exit
- add set_data_args (partial replacement for phase_enter)
- functor handles the contexts
- Update lib/rocprofiler/rocprofiler.cpp
- implement rocprofiler_get_version
- Update lib/rocprofiler/hsa/hsa.{hpp,cpp}
- remove hsa_api_ prefix for functions already in hsa namespace
- Update lib/rocprofiler/context/context.{hpp,cpp}
- add client_idx to context struct (tool identifier)
- add push_client function to set client_idx before context is allocated
- add pop_client function to remove client identifier from future context creations
- implemented {registered,active}_contexts and buffers to use new container::reserve_size overload to stable_vector
- fix implementation of start_context
- fix implementation of stop_context
- Update lib/rocprofiler/rocprofiler.cpp
- prevent context creation, buffer creation, pc sampling config, etc. after initialization
- add nullptr checks to rocprofiler_context_is_valid
- fix rocprofiler_configure_callback_tracing_service
- was checking size of buffers, not registered context
- implement rocprofiler_iterate_callback_trace_kind_names
- implement rocprofiler_iterate_callback_trace_kind_operation_names
- Update lib/rocprofiler/CMakeLists.txt
- add registration.{hpp,cpp} to rocprofiler-library target sources
- Update lib/rocprofiler/hsa/utils.hpp
- fix using fmt::formt with const char* strings
- remove join functions (no longer used)
- Update lib/rocprofiler/hsa/hsa.{hpp,cpp}
- remove args_string function
- remove named_args_string function
- update iterate_args function
- change callback type
- accept user data
- rework the hsa_api_impl<Idx>::functor function
- save the rocprofiler_callback_tracing_record_t between callbacks
- update update_table function
- check buffered_tracer domains
- remove comments
- Update lib/rocprofiler/hsa/defines.hpp
- remove MEMBER_<N> macros
- add ADDR_MEMBER_<N> macros
- remove doxygen comments for GET_MEMBER_FIELDS
- add GET_ADDR_MEMBER_FIELDS
- update HSA_API_INFO_DEFINITION_{0,V}
- rename domain_idx to callback_domain_idx
- add buffered_domain_idx
- add as_arg_addr function
- Update lib/rocprofiler/rocprofiler.cpp
- implement rocprofiler_iterate_callback_trace_operation_args
- Remove lib/rocprofiler/tracing.{hpp,cpp} and lib/rocprofiler/CMakeLists.txt
- unused
- Update lib/rocprofiler/hsa/hsa.{hpp,cpp}
- support buffered tracing in hsa_api_impl<Idx>::functor
- rocprofiler_callback_trace_operation_args_cb_t -> rocprofiler_callback_tracing_operation_args_cb_t
- i.e. trace -> tracing
- Update lib/rocprofiler/context/context.{hpp,cpp}
- removed buffer_instance struct
- removed allocate_buffer function
- removed get_buffers function
- changed buffer_tracing_service::buffer_array_t
- Update lib/rocprofiler/hsa: hsa.cpp, ostream.hpp, details folder
- move ostream.hpp into details folder to prevent from contributing to code coverage
- update cmake build system for new directory
* Add lib/rocprofiler/registration.{hpp,cpp}
- implements rocprofiler_set_api_table (called by rocprofiler-register)
- miscellaneous functions for client configure/initialize/finalize
- functions for querying the init/fini status
- relocated OnLoad HSA workaround to this file
- at present, this is used to workaround ROCr not having rocprofiler-register integration yet
- implement rocprofiler_force_configure function
- implement rocprofiler_is_initialized function
- implement rocprofiler_is_finalized function
- ensure configure functions only invoked once
- ensure internal thread creation notification functions are invoked
- get_status is pair of atomics
- fix heap-use-after-free in init_logging
- update finalize
- invoke hsa_shut_down
- set all active contexts to null pointers
* Add lib/rocprofiler/buffer_tracing.cpp
- contains implementations of buffer_tracing (i.e. rocprofiler/buffer_tracing.h)
- previous implementation may have been moved out of lib/rocprofiler/rocprofiler.cpp
* Add lib/rocprofiler/buffer.{hpp,cpp}
- contains implementations of buffer (i.e. rocprofiler/buffer.h) and misc internal access functions
- previous implementation may have been moved out of lib/rocprofiler/rocprofiler.cpp and lib/rocprofiler/context/context.{hpp,cpp}
* Add lib/rocprofiler/callback_tracing.cpp
- contains implementations of callback_tracing (i.e. rocprofiler/callback_tracing.h)
- previous implementation may have been moved out of lib/rocprofiler/rocprofiler.cpp
* Add lib/rocprofiler/context.cpp
- contains implementations of context public API functions (i.e. rocprofiler/context.h)
- previous implementation may have been moved out of lib/rocprofiler/rocprofiler.cpp
* Add lib/rocprofiler/internal_threading.{hpp,cpp}
- contains implementations of internal_threading (i.e. rocprofiler/internal_threading.h)
- also contains implementations of internal access functions
- update finalize function
- join all task groups and destroy all thread pools first, then reset unique_ptr
* Update lib/rocprofiler/rocprofiler.cpp
- rocprofiler_get_version returns status
- implement rocprofiler_get_timestamp
- remove misc implementations that were split into other files
* Update lib/rocprofiler/CMakeLists.txt
- compile new implementation files
- buffer.cpp
- buffer_tracing.cpp
- callback_tracing.cpp
- context.cpp
- internal_threading.cpp
* Update lib/tests/buffering/buffering-*.cpp
- update to reflect changes to rocprofiler_record_header_t
* Update CMakeLists.txt
- increase minimum cmake version to 3.21 which added HIP support as a language
* Add samples/apps/transpose
- simple HIP application for testing
* Add samples/api_callback_tracing
- HIP application and tool library
- This effectively demos how to setup HSA API tracing
- For each function called in tool, it stores the func/file/line and prints it during finalization
- client.hpp and client.cpp are the tool library
- Implement use of rocprofiler_iterate_callback_trace_operation_args
- add demo of using rocprofiler_get_version
- add_test
- remove PASS_REGULAR_EXPRESSION
- causing false passes during memcheck
- add ROCPROFILER_MEMCHECK_PRELOAD_ENV to environment
- check if rocprofiler is initialized before stopping context
* Add samples/api_buffered_tracing
- Sample demonstrating tracing the HSA API via buffering
- demo rocprofiler_record_header_compute_hash
- throw exceptions for unexpected buffer data
- add_test
- remove PASS_REGULAR_EXPRESSION
- causing false passes during memcheck
- add ROCPROFILER_MEMCHECK_PRELOAD_ENV to environment
* Update samples/CMakeLists.txt
- add subdirectory for api_callback_tracing
- add subdirectory api_buffered_tracing
* Update samples/pc_sampling/common.h
- fix processing of headers
* Update lib/rocprofiler/hsa/details/ostream.hpp
- fix data race on HSA_depth_max_cnt and recursion
- HSA_depth_max_cnt and recursion is now thread-local static instead of global static
- replace std::string usage with std::string_view
* Actions update
- add dependabot.yml
- use actions/checkout@v4
- install latest libasan and libtsan in sanitizer containers
* Add PTL (Parallel Tasking Library) submodule
|
||
|
|
3769bb7dbf |
Minor documentation workflow updates (#53)
* Document rocprofiler version defines - write doxygen for preprocessor defines - make ROCPROFILER_SOVERSION number similar to ROCPROFILER_VERSION - remove ROCPROFILER_COMPILER_STRING * Update rocprofiler.dox.in - reformatted - include rocprofiler/version.h in doxygen - tweaked dot settings, e.g. made dot SVGs non-interactive * Update scripts/update-docs.sh - configure with cmake ROCPROFILER_INTERNAL_BUILD_DOCS=ON which just generates version.h and exits * Update CMakeLists.txt - support ROCPROFILER_INTERNAL_BUILD_DOCS=ON option for generating version.h and exiting |
||
|
|
c0cb907fee |
Support different HSA table sizes (#44)
* Support different HSA table sizes
- Use hsa-runtime64_VERSION to define pp defs for major and minor version in version.h.in
- Update version.h.in to define ROCPROFILER_HSA_RUNTIME_VERSION_{MAJOR,MINOR}
- Use HSA_AMD_INTERFACE_VERSION_{MAJOR,MINOR} to handle hsa_amd_vmem_* support
- add template specializations for hsa_amd_vmem_* functions
- implement HSA version based static asserts
* Debug commit
- print pp value for ROCPROFILER_HSA_RUNTIME_VERSION and ROCPROFILER_HSA_RUNTIME_EXT_AMD_VERSION
* Debug commit
- fix ROCPROFILER_HSA_RUNTIME_VERSION value
* Remove debug edits
* Update lib/rocprofiler/hsa/utils.hpp
- support outputting:
- hsa_amd_memory_pool_t
- hsa_amd_vmem_alloc_handle_t
- hsa_amd_memory_access_desc_t
- hsa_amd_memory_pool_t
* Update lib/rocprofiler/hsa/utils.hpp
- tweak to join_impl
* Update lib/rocprofiler/hsa/utils.hpp
- use formatting when possible
* Update lib/rocprofiler/hsa/types.hpp
- Support API_TABLE_MAJOR_VERSIONS > 1
* Update lib/rocprofiler/hsa/types.hpp
- remove inherit from undefined template specialization
* Update lib/rocprofiler/hsa/utils.hpp
- remove duplicate formatter specialization
* Update include/rocprofiler/hsa/api_args.h
- remove const from non-pointer anonymous structs in union
* Use HSA_AMD_EXT_API_TABLE_MAJOR_VERSION
|
||
|
|
729c34fb60 |
Docs skeleton (#51)
* Add doxygen-awesome-css submodule * Basic documentation files - conf.py: run by sphinx - about.md: info about rocprofiler - features.md: overview of features - installation.md: build/test/install instructions - index.md: sets up main page - generate-doxyfile.cmake: generates rocprofiler.dox with rocprofiler-specific info - environment.yml: conda environment - Makefile: sphinx makefile - README.md: build instructions - rocprofiler.dox.in: doxygen template - .gitignore: ignores generated files - .nojekyll: prevents GitHub Pages from using Jekyll for deployment of pages * Documentation scripts - scripts for doing common sequences of commands for building docs - update-docs.sh: builds the docs and installs the docs if /docs directory is present - update-doxygen.sh: quick script for generating doxygen * Workflow for docs - step for building docs - step for deploying docs * Update doxygen comments in include/rocprofiler - rocprofiler.h / rocprofiler_plugins.h - fixed non-existent global references in doxygen comments - fixed parameter names that were incorrect or not updated * Update docs workflow - only deploy docs when on main branch |
||
|
|
b12ef4a75e |
Buffering: initial implementation and tests (#20)
* Update source/lib/common
- CMakeLists.txt
- less verbose
- rocprofiler-common-library uses rocprofiler-headers target
- mpl.hpp
- metaprogramming header with type_list, size_of, index_of, and is_one_of
- record_header_buffer.{hpp,cpp}
- wrapper class around atomic_ring_buffer and vector of rocprofiler_record_header_t
- atomic_ring_buffer.{hpp,cpp}
- request function accepts wrap param when overwritting is not desirable
- can_clear member function
- clear member function for rewinding write pointer to start of buffer
- containers/CMakeLists.txt
- include record_header_buffer.{hpp,cpp} in build target
* Update source/lib/tests: Buffering tests
- Added buffering tests. See comments in code for description
* atomic_ring_buffer -> ring_buffer
- remove ring_buffer implementation
- rename atomic_ring_buffer to ring_buffer
* atomic_ring_buffer -> ring_buffer
- remove ring_buffer implementation
- rename atomic_ring_buffer to ring_buffer
* Update record_header_buffer
- lock, unlock, is_locked, clear, save, and load member functions
* Buffering tests
- add buffer test for save/load capability
* Update rocprofiler_memcheck.cmake
- fix erroneous spaces causing incorrect string evaluation
* Update ring_buffer
- fix exception message
* undef HIP_PROF_API
- make sure HIP_PROF_API is undefined before including hip_runtime.h
- avoid directly including hip/hip_runtime.h
* Update rocprofiler_config_interfaces
- remove stale preprocessor defines that are from old rocprofiler/roctracer
- HIP_PROF_HIP_API_STRING=1
- PROF_API_IMPL=1
* Update run-ci.py
- fix paths to suppression files
- improve printing logs to console in github actions
* Update buffering implementation
- remove support for using malloc instead of mmap in ring_buffer
- provide some info functions in record_header_buffer
- improve the testing of the save-load buffer test
* Update run-ci.py
- fix CTEST_CUSTOM_COVERAGE_EXCLUDE
* Update hip/api_args.h
- remove undef HIP_PROF_API
* Update buffering-save-load.cpp
- updated comments
* Update record_header_buffer
- default ctor
- allocate member function
- is_allocated member function
* Update buffering-save-load.cpp
- tweaked usage of record_header_buffer to delay allocation
|
||
|
|
41b1d91841 |
SortIncludes: true (#19)
* Update .clang-format - set SortIncludes to true * Reformat source with includes sorted |
||
|
|
39b209c2a7 |
Updated rocprofiler.h for v2 (#18)
* Update and rename rocprofiler.h to rocprofiler.h.in - Removing Service IDs - Fixing agent_id to be agent * [0/N] New rocprofiler headers - created rocprofiler/defines.h - ppdef macros - created rocprofiler/hip.h - HIP specific types - created rocprofiler/hsa.h - HSA specific types - created rocprofiler/marker.h - Marker (ROCTx) specific types - create version.h.in - file containing version info - updated source/lib/rocprofiler/CMakeLists.txt - set DEFINE_SYMBOL - compile defs provided by rocprofiler::rocprofiler-headers * [1/N] Update rocprofiler.h - pragma once - removed some ppdefs (in version.h.in and defines.h) - extern "C" after includes - added *_NONE and *_LAST enum values to all enums - provided some rocprofiler_status_t enums - tweaked rocprofiler_agent_type_t enum fields - tweaked rocprofiler_agent_info_t enum fields - provided rocprofiler_tracer_activity_domain_t - added missing rocprofiler_counter_instance_id_t typedef - may not be correct - provided rocprofiler_record_header_t struct - provided rocprofiler_record_tracer_t struct - add ROCPROFILER_NONNULL attribute where appropriate - CMakeLists.txt: add subdirectories for hsa, hip, and marker - defines.h: remove ROCPROFILER_CALL ppdef - rocprofiler.h - ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED - extend rocprofiler_agent_t - modify rocprofiler_query_available_agents signature to callback - rocprofiler_pc_sampling_config_array_t - update rocprofiler_buffer_callback_t to include context id - update rocprofiler_create_buffer to accept context - rocprofiler_plugin.h - non-const rocprofiler_record_header_t** * [2/N] Update include/rocprofiler/rocprofiler_plugin.h - change prototype of rocprofiler_plugin_write_buffer_records to resemble rocprofiler_buffer_callback_t * [3/N] Update include/rocprofiler/hsa - Update hsa.h - Details in hsa subfolder * [4/N] Update include/rocprofiler/hip - Update hip.h - Details in hip subfolder * [5/N] Update include/rocprofiler/marker - Update marker.h - Details in marker subfolder * [6/N] Update samples/pc_sampling - fix issues with macros - fix API changes --------- Co-authored-by: Jonathan Madsen <jrmadsen@users.noreply.github.com> |
||
|
|
351d825a8d |
Initial skeleton (revised) (#16)
* [0/N] git submodules * [1/N] Update cmake, gitignore, external - clang-tidy file - update .gitignore - update main CMakeLists.txt - update external/CMakeLists.txt - update rocprofiler_config_interfaces.cmake - update rocprofiler_formatting.cmake - update rocprofiler_interfaces.cmake - update rocprofiler_linting.cmake - update rocprofiler_options.cmake - update rocprofiler_utilities.cmake * [2/N] Update rocprofiler/config.h - update to work with new rocprofiler.h * [3/N] Update source/lib/rocprofiler/hsa - hsa-types.h: static asserts - hsa.cpp: copyTables scope - hsa.gen.cpp: ACTIVITY_DOMAIN_HSA_API -> ROCPROFILER_TRACER_ACTIVITY_DOMAIN_HSA_API - rename some files - add rocprofiler_ prefix to types and enums - HSA_API_TABLE_LOOKUP_DEFINITION macro - get_saved_table() -> get_table() * [4/N] Update source/lib/common - CMake: change target_link_libraries - defines.hpp: remove ppdefs defined in include/rocprofiler/defines.h * [5/N] Update source/lib/rocprofiler - updates due to changes in rocprofiler.h - rocprofiler_config.cpp: remove unions which are now defined in include/rocprofiler - CMakeLists.txt: rocprofiler.cpp and public hsa-runtime and hip libraries - rocprofiler.cpp: dummy implementations for: - rocprofiler_query_available_agents - rocprofiler_create_context - rocprofiler_start_context - rocprofiler_stop_context - rocprofiler_flush_buffer - rocprofiler_destroy_buffer * [6/N] Update license - replace stale LBNL license * [7/N] CMake format |
||
|
|
527aa71f5a |
Initial skeleton (#1)
* googletest submodule * cmake folder * misc root files - clang-format - cmake-format - pyproject.toml - requirements.txt - VERSION * workflows * RPM files * external folder * samples folder * tests root folder * source/bin folder * source/include folder * source/lib/common folder * source/lib/plugins folder * source/lib/tests folder - for library unit tests * source/lib/rocprofiler folder - rocprofiler library implementation * Remaining cmake files * lib/common/containers - ring_buffer - atomic_ring_buffer - stable_vector - static_vector * Update .gitignore * Update hsa.hpp - include cstdint * cmake formatting (cmake-format) (#2) Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com> * Remove linting.yml - uses self-hosted runners --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> |