Файли
Gopesh Bhardwaj c563286f96 Update changelog for ROCprofiler-SDK 1.1.0 (#2717)
using only arch name
2026-01-21 20:15:39 +05:30

12 KiB

Changelog for ROCprofiler-SDK

Full documentation for ROCprofiler-SDK is available at rocm.docs.amd.com/projects/rocprofiler-sdk

ROCprofiler-SDK for AFAR I

Added

  • HSA API tracing
  • Kernel dispatch tracing
  • Kernel dispatch counter collection
    • Instances reported as single dimension
    • No serialization

ROCprofiler-SDK for AFAR II

Added

  • HIP API tracing
  • ROCTx tracing
  • Tracing ROCProf Tool V3
  • Documentation packaging
  • ROCTx control (start and stop)
  • Memory copy tracing

ROCprofiler-SDK for AFAR III

Added

  • Kernel dispatch counter collection. This includes serialization and multidimensional instances.
  • Kernel serialization.
  • Serialization control (on and off).
  • ROCprof tool plugin interface V3 for counters and dimensions.
  • Support to list metrics.
  • Correlation-Id retirement
  • HIP and HSA trace distinction:
    • --hip-runtime-trace For collecting HIP Runtime API traces
    • --hip-compiler-trace For collecting HIP compiler-generated code traces
    • --hsa-core-trace For collecting HSA API traces (core API)
    • --hsa-amd-trace For collecting HSA API traces (AMD-extension API)
    • --hsa-image-trace For collecting HSA API traces (image-extension API)
    • --hsa-finalizer-trace For collecting HSA API traces (finalizer-extension API)

ROCprofiler-SDK for AFAR IV

Added

API:

  • Page migration reporting
  • Scratch memory reporting
  • Kernel dispatch callback tracing
  • External correlation Id request service
  • Buffered counter collection record headers
  • Option to remove HSA dependency from counter collection

Tool:

  • rocprofv3 multi-GPU support in a single-process

ROCprofiler-SDK for AFAR V

Added

API:

  • Agent or device counter collection
  • PC sampling (beta)

Tool:

  • Single JSON output format support
  • Perfetto output format support (.pftrace)
  • Input YAML support for counter collection
  • Input JSON support for counter collection
  • Application replay in counter collection
  • rocprofv3 multi-GPU support:
    • Multiprocess (multiple files)

Changed

  • rocprofv3 tool now requires mentioning -- before the application. For detailed use, see Using rocprofv3

Resolved issues

  • Fixed SQ_ACCUM_PREV and SQ_ACCUM_PREV_HIRE overwriting issue

ROCprofiler-SDK 0.4.0 for ROCm release 6.2 (AFAR VI)

Added

  • OTF2 tool support
  • Kernel and range filtering
  • Counter collection definitions in YAML
  • Documentation updates (SQ block, counter collection, tracing, tool usage)
  • rocprofv3 option --kernel-rename
  • rocprofv3 options for Perfetto settings (buffer size and so on)
  • CSV columns for kernel trace
    • Thread_Id
    • Dispatch_Id
  • CSV column for counter collection

ROCprofiler-SDK 0.5.0 for ROCm release 6.3 (AFAR VII)

Added

  • Start and end timestamp columns to the counter collection csv output
  • Check to force tools to initialize context id with zero
  • Support to specify hardware counters for collection using rocprofv3 as rocprofv3 --pmc [COUNTER [COUNTER ...]]
  • Memory Allocation Tracing
  • PC sampling tool support with CSV and JSON output formats
  • List supported PC Sampling Configurations

Changed

  • --marker-trace option for rocprofv3 now supports the legacy ROCTx library libroctx64.so when the application is linked against the new library librocprofiler-sdk-roctx.so.
  • Replaced deprecated hipHostMalloc and hipHostFree functions with hipExtHostAlloc and hipFreeHost for ROCm versions starting 6.3.
  • Updated rocprofv3 --help options.
  • Changed naming of "agent profiling" to a more descriptive "device counting service". To convert existing tool or user code to the new name, use the following sed: find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +
  • Changed naming of "dispatch profiling service" to a more descriptive "dispatch counting service". To convert existing tool or user code to the new names, the following sed can be used: -type f -exec sed -i -e 's/dispatch_profile_counting_service/dispatch_counting_service/g' -e 's/dispatch_profile.h/dispatch_counting_service.h/g' -e 's/rocprofiler_profile_counting_dispatch_callback_t/rocprofiler_dispatch_counting_service_callback_t/g' -e 's/rocprofiler_profile_counting_dispatch_data_t/rocprofiler_dispatch_counting_service_data_t/g' -e 's/rocprofiler_profile_counting_dispatch_record_t/rocprofiler_dispatch_counting_service_record_t/g' {} +
  • FETCH_SIZE metric on gfx94x now uses TCC_BUBBLE for 128B reads.
  • PMC dispatch-based counter collection serialization is now per-device instead of being global across all devices.
  • Added output return functionality to rocprofiler_sample_device_counting_service
  • Added rocprofiler_load_counter_definition.

Resolved issues

  • Create subdirectory when rocprofv3 --output-file includes a folder path
  • Fixed misaligned stores (undefined behavior) for buffer records
  • Fixed crash when only scratch reporting is enabled
  • Fixed MeanOccupancy metrics
  • Fixed aborted-application validation test to properly check for hipExtHostAlloc command
  • Fixed implicit reduction of SQ and GRBM metrics
  • Fixed support for derived counters in reduce operation
  • Bug fixed in max-in-reduce operation
  • Introduced fix to handle a range of values for select() dimension in expressions parser
  • Conditional aql::set_profiler_active_on_queue only when counter collection is registered (resolves Navi3 kernel tracing issues)

Removed

  • Removed gfx8 metric definitions
  • Removed rocprofv3 installation to sbin directory

ROCprofiler-SDK 0.6.0 for ROCm release 6.4

Added

  • Support for select() operation in counter expression.
  • reduce() operation for counter expression with respect to dimension.
  • --collection-period feature in rocprofv3 to enable filtering using time.
  • --collection-period-unit feature in rocprofv3 to control time units used in collection period option.
  • Deprecation notice for ROCProfiler and ROCProfilerV2.
  • Support for rocDecode API Tracing
  • Usage documentation for ROCTx
  • Usage documentation for MPI applications
  • SDK: rocprofiler_agent_v0_t support for agent UUIDs
  • SDK: rocprofiler_agent_v0_t support for agent visibility based on gpu isolation environment variables such as ROCR_VISIBLE_DEVICES and so on.
  • Accumulation VGPR support for rocprofv3.
  • Host-trap based PC sampling support for rocprofv3.
  • Support for OpenMP tool.

ROCprofiler-SDK 1.0.0 for ROCm release 7.0

Added

  • Support for rocJPEG API Tracing.
  • Support for AMD Instinct MI350X and MI355X accelerators.
  • rocprofiler_create_counter to facilitate adding custom derived counters at runtime.
  • Support in rocprofv3 for iteration based counter multiplexing.
  • Perfetto support for counter collection.
  • Support for negating rocprofv3 tracing options when using aggregate options such as --sys-trace --hsa-trace=no.
  • --agent-index option in rocprofv3 to specify the agent naming convention in the output:
    • absolute == node_id
    • relative == logical_node_id
    • type-relative == logical_node_type_id
  • MI300 and MI350 stochastic (hardware-based) PC sampling support in ROCProfiler-SDK and rocprofv3.
  • Python bindings for rocprofiler-sdk-roctx
  • SQLite3 output support for rocprofv3 using --output-format rocpd.
  • rocprofiler-sdk-rocpd package:
    • Public API in include/rocprofiler-sdk-rocpd/rocpd.h.
    • Library implementation in librocprofiler-sdk-rocpd.so.
    • Support for find_package(rocprofiler-sdk-rocpd).
    • rocprofiler-sdk-rocpd DEB and RPM packages.
  • --version option in rocprofv3.
  • rocpd Python package.
  • Thread trace as experimental API.
  • ROCprof Trace Decoder as experimental API:
  • Thread trace option in the rocprofv3 tool under the --att parameters:
  • rocpd output format documentation:
  • Perfetto support for scratch memory.
  • Support in the rocprofv3 avail tool for command-line arguments.
  • Documentation for rocprofv3 advanced options.
  • Support for multi dispatch ATT file added

Changed

  • SDK to NOT to create a background thread when every tool returns a nullptr from rocprofiler_configure.
  • vaddr-to-file-offset mapping in disassembly.hpp to use the dedicated comgr API.
  • rocprofiler_uuid_t ABI to hold 128 bit value.
  • rocprofv3 shorthand argument for --collection-period to -P (upper-case) while -p (lower-case) is reserved for later use.
  • Default output format for rocprofv3 to rocpd (SQLite3 database).
  • rocprofv3 avail tool to be renamed from rocprofv3_avail to rocprofv3-avail tool.
  • rocprofv3 tool to facilitate thread trace and PC sampling on the same agent.

Resolved issues

  • Fixed missing callbacks around internal thread creation within counter collection service.
  • Fixed potential data race in the ROCprofiler-SDK double buffering scheme.
  • Fixed usage of std::regex in the core ROCprofiler-SDK library that caused segfaults or exceptions when used under dual ABI.
  • Fixed Perfetto counter collection by introducing accumulation per dispatch.
  • Fixed code object disassembly for missing function inlining information.
  • Fixed queue preemption error and HSA_STATUS_ERROR_INVALID_PACKET_FORMAT error for stochastic PC-sampling in MI300X, leading to stabler runs.
  • Fixed the system hang issue for host-trap PC-sampling on MI300X.
  • Fixed rocpd counter collection issue when counter collection alone is enabled. rocpd_kernel_dispatch table is updated to be populated by counters data instead of kernel_dispatch data.
  • Fixed rocprofiler_*_id_t structs for inconsistency related to a "null" handle:
    • The correct definition for a null handle is .handle = 0 while some definitions previously used UINT64_MAX.
  • Fixed kernel trace csv output generated by rocpd.

Removed

  • Support for compilation of gfx940 and gfx941 targets.

ROCprofiler-SDK 1.1.0 for ROCm release 7.1

Added

  • Dynamic process attachment- ROCprofiler-sdk and rocprofv3 now facilitate dynamic profiling of a running GPU applications by attaching to its process ID (PID), rather than launching the application through the profiler itself.
  • Scratch-memory trace information to the Perfetto output in rocprofv3.
  • New capabilities to the thread trace support in rocprofv3, including real-time clock support for thread trace alignment on gfx9 architecture. This enables high-resolution clock computation and better synchronization across shader engines. Additionally, MultiKernelDispatch thread trace support is now available across all ASICs.
  • Documentation for dynamic process attachment.
  • Documentation for rocpd summaries.

Optimized

  • Improved the stability and robustness of the rocpd output.

ROCprofiler-SDK 1.1.0 for ROCm release 7.2

Added

  • Counter collection support for gfx1150 and gfx1151.
  • HSA Extension API v8 support.
  • hipStreamCopyAttributes API implementation.

Optimized

Resolved issues

  • Fixed multi-GPU dimension mismatch.
  • Fixed device lock issue for dispatch counters.
  • Addressed OpenMP Tools task scheduling null pointer exception.
  • Fixed stream ID errors arising during process attachment.
  • Fixed issues arising during dynamic code object loading.