# Changelog for ROCprofiler-SDK Full documentation for ROCprofiler-SDK is available at [rocm.docs.amd.com/projects/rocprofiler-sdk](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/index.html) ## ROCprofiler-SDK for AFAR I ### Added - HSA API tracing - Kernel dispatch tracing - Kernel dispatch counter collection - Instances reported as single dimension - No serialization ## ROCprofiler-SDK for AFAR II ### Added - HIP API tracing - ROCTx tracing - Tracing ROCProf Tool V3 - Documentation packaging - ROCTx control (start and stop) - Memory copy tracing ## ROCprofiler-SDK for AFAR III ### Added - Kernel dispatch counter collection. This includes serialization and multidimensional instances. - Kernel serialization. - Serialization control (on and off). - ROCprof tool plugin interface V3 for counters and dimensions. - Support to list metrics. - Correlation-Id retirement - HIP and HSA trace distinction: - --hip-runtime-trace For collecting HIP Runtime API traces - --hip-compiler-trace For collecting HIP compiler-generated code traces - --hsa-core-trace For collecting HSA API traces (core API) - --hsa-amd-trace For collecting HSA API traces (AMD-extension API) - --hsa-image-trace For collecting HSA API traces (image-extension API) - --hsa-finalizer-trace For collecting HSA API traces (finalizer-extension API) ## ROCprofiler-SDK for AFAR IV ### Added **API:** - Page migration reporting - Scratch memory reporting - Kernel dispatch callback tracing - External correlation Id request service - Buffered counter collection record headers - Option to remove HSA dependency from counter collection **Tool:** - `rocprofv3` multi-GPU support in a single-process ## ROCprofiler-SDK for AFAR V ### Added **API:** - Agent or device counter collection - PC sampling (beta) **Tool:** - Single JSON output format support - Perfetto output format support (.pftrace) - Input YAML support for counter collection - Input JSON support for counter collection - Application replay in counter collection - `rocprofv3` multi-GPU support: - Multiprocess (multiple files) ### Changed - `rocprofv3` tool now requires mentioning `--` before the application. For detailed use, see [Using rocprofv3](source/docs/how-to/using-rocprofv3.rst) ### Resolved issues - Fixed `SQ_ACCUM_PREV` and `SQ_ACCUM_PREV_HIRE` overwriting issue ## ROCprofiler-SDK 0.4.0 for ROCm release 6.2 (AFAR VI) ### Added - OTF2 tool support - Kernel and range filtering - Counter collection definitions in YAML - Documentation updates (SQ block, counter collection, tracing, tool usage) - `rocprofv3` option `--kernel-rename` - `rocprofv3` options for Perfetto settings (buffer size and so on) - CSV columns for kernel trace - `Thread_Id` - `Dispatch_Id` - CSV column for counter collection ## ROCprofiler-SDK 0.5.0 for ROCm release 6.3 (AFAR VII) ### Added - Start and end timestamp columns to the counter collection csv output - Check to force tools to initialize context id with zero - Support to specify hardware counters for collection using rocprofv3 as `rocprofv3 --pmc [COUNTER [COUNTER ...]]` - Memory Allocation Tracing - PC sampling tool support with CSV and JSON output formats - List supported PC Sampling Configurations ### Changed - `--marker-trace` option for `rocprofv3` now supports the legacy ROCTx library `libroctx64.so` when the application is linked against the new library `librocprofiler-sdk-roctx.so`. - Replaced deprecated `hipHostMalloc` and `hipHostFree` functions with `hipExtHostAlloc` and `hipFreeHost` for ROCm versions starting 6.3. - Updated `rocprofv3` `--help` options. - Changed naming of "agent profiling" to a more descriptive "device counting service". To convert existing tool or user code to the new name, use the following sed: `find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +` - Changed naming of "dispatch profiling service" to a more descriptive "dispatch counting service". To convert existing tool or user code to the new names, the following sed can be used: `-type f -exec sed -i -e 's/dispatch_profile_counting_service/dispatch_counting_service/g' -e 's/dispatch_profile.h/dispatch_counting_service.h/g' -e 's/rocprofiler_profile_counting_dispatch_callback_t/rocprofiler_dispatch_counting_service_callback_t/g' -e 's/rocprofiler_profile_counting_dispatch_data_t/rocprofiler_dispatch_counting_service_data_t/g' -e 's/rocprofiler_profile_counting_dispatch_record_t/rocprofiler_dispatch_counting_service_record_t/g' {} +` - `FETCH_SIZE` metric on gfx94x now uses `TCC_BUBBLE` for 128B reads. - PMC dispatch-based counter collection serialization is now per-device instead of being global across all devices. - Added output return functionality to rocprofiler_sample_device_counting_service - Added rocprofiler_load_counter_definition. ### Resolved issues - Create subdirectory when `rocprofv3 --output-file` includes a folder path - Fixed misaligned stores (undefined behavior) for buffer records - Fixed crash when only scratch reporting is enabled - Fixed `MeanOccupancy` metrics - Fixed aborted-application validation test to properly check for `hipExtHostAlloc` command - Fixed implicit reduction of SQ and GRBM metrics - Fixed support for derived counters in reduce operation - Bug fixed in max-in-reduce operation - Introduced fix to handle a range of values for `select()` dimension in expressions parser - Conditional `aql::set_profiler_active_on_queue` only when counter collection is registered (resolves Navi3 kernel tracing issues) ### Removed - Removed gfx8 metric definitions - Removed `rocprofv3` installation to sbin directory ## ROCprofiler-SDK 0.6.0 for ROCm release 6.4 ### Added - Support for `select()` operation in counter expression. - `reduce()` operation for counter expression with respect to dimension. - `--collection-period` feature in `rocprofv3` to enable filtering using time. - `--collection-period-unit` feature in `rocprofv3` to control time units used in collection period option. - Deprecation notice for ROCProfiler and ROCProfilerV2. - Support for rocDecode API Tracing - Usage documentation for ROCTx - Usage documentation for MPI applications - SDK: `rocprofiler_agent_v0_t` support for agent UUIDs - SDK: `rocprofiler_agent_v0_t` support for agent visibility based on gpu isolation environment variables such as `ROCR_VISIBLE_DEVICES` and so on. - Accumulation VGPR support for `rocprofv3`. - Host-trap based PC sampling support for rocprofv3. - Support for OpenMP tool. ## ROCprofiler-SDK 1.0.0 for ROCm release 7.0 ### Added - Support for [rocJPEG](https://rocm.docs.amd.com/projects/rocJPEG/en/latest/index.html) API Tracing. - Support for AMD Instinct MI350X and MI355X accelerators. - `rocprofiler_create_counter` to facilitate adding custom derived counters at runtime. - Support in `rocprofv3` for iteration based counter multiplexing. - Perfetto support for counter collection. - Support for negating `rocprofv3` tracing options when using aggregate options such as `--sys-trace --hsa-trace=no`. - `--agent-index` option in `rocprofv3` to specify the agent naming convention in the output: - absolute == node_id - relative == logical_node_id - type-relative == logical_node_type_id - MI300 and MI350 stochastic (hardware-based) PC sampling support in ROCProfiler-SDK and `rocprofv3`. - Python bindings for `rocprofiler-sdk-roctx` - SQLite3 output support for `rocprofv3` using `--output-format rocpd`. - `rocprofiler-sdk-rocpd` package: - Public API in `include/rocprofiler-sdk-rocpd/rocpd.h`. - Library implementation in `librocprofiler-sdk-rocpd.so`. - Support for `find_package(rocprofiler-sdk-rocpd)`. - `rocprofiler-sdk-rocpd` DEB and RPM packages. - `--version` option in `rocprofv3`. - `rocpd` Python package. - Thread trace as experimental API. - ROCprof Trace Decoder as experimental API: - Requires [ROCprof Trace Decoder plugin](https://github.com/rocm/rocprof-trace-decoder). - Thread trace option in the `rocprofv3` tool under the `--att` parameters: - See [using thread trace with rocprofv3](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/amd-mainline/how-to/using-thread-trace.html) - Requires [ROCprof Trace Decoder plugin](https://github.com/rocm/rocprof-trace-decoder). - `rocpd` output format documentation: - Requires [ROCprof Trace Decoder plugin](https://github.com/rocm/rocprof-trace-decoder). - Perfetto support for scratch memory. - Support in the `rocprofv3` avail tool for command-line arguments. - Documentation for `rocprofv3` advanced options. - Support for multi dispatch ATT file added ### Changed - SDK to NOT to create a background thread when every tool returns a nullptr from `rocprofiler_configure`. - `vaddr-to-file-offset` mapping in `disassembly.hpp` to use the dedicated comgr API. - `rocprofiler_uuid_t` ABI to hold 128 bit value. - `rocprofv3` shorthand argument for `--collection-period` to `-P` (upper-case) while `-p` (lower-case) is reserved for later use. - Default output format for `rocprofv3` to `rocpd` (SQLite3 database). - `rocprofv3` avail tool to be renamed from `rocprofv3_avail` to `rocprofv3-avail` tool. - `rocprofv3` tool to facilitate thread trace and PC sampling on the same agent. ### Resolved issues - Fixed missing callbacks around internal thread creation within counter collection service. - Fixed potential data race in the ROCprofiler-SDK double buffering scheme. - Fixed usage of std::regex in the core ROCprofiler-SDK library that caused segfaults or exceptions when used under dual ABI. - Fixed Perfetto counter collection by introducing accumulation per dispatch. - Fixed code object disassembly for missing function inlining information. - Fixed queue preemption error and `HSA_STATUS_ERROR_INVALID_PACKET_FORMAT` error for stochastic PC-sampling in MI300X, leading to stabler runs. - Fixed the system hang issue for host-trap PC-sampling on MI300X. - Fixed `rocpd` counter collection issue when counter collection alone is enabled. `rocpd_kernel_dispatch` table is updated to be populated by counters data instead of kernel_dispatch data. - Fixed `rocprofiler_*_id_t` structs for inconsistency related to a "null" handle: - The correct definition for a null handle is `.handle = 0` while some definitions previously used `UINT64_MAX`. - Fixed kernel trace csv output generated by `rocpd`. ### Removed - Support for compilation of gfx940 and gfx941 targets. ## ROCprofiler-SDK 1.1.0 for ROCm release 7.1 ### Added - Dynamic process attachment- ROCprofiler-sdk and `rocprofv3` now facilitate dynamic profiling of a running GPU applications by attaching to its process ID (PID), rather than launching the application through the profiler itself. - Scratch-memory trace information to the Perfetto output in `rocprofv3`. - New capabilities to the thread trace support in `rocprofv3`, including real-time clock support for thread trace alignment on gfx9 architecture. This enables high-resolution clock computation and better synchronization across shader engines. Additionally, `MultiKernelDispatch` thread trace support is now available across all ASICs. - Documentation for dynamic process attachment. - Documentation for `rocpd` summaries. ### Optimized - Improved the stability and robustness of the `rocpd` output. ## ROCprofiler-SDK 1.1.0 for ROCm release 7.2 ### Added - Counter collection support for `gfx1150` and `gfx1151`. - HSA Extension API v8 support. - `hipStreamCopyAttributes` API implementation. ### Optimized - Improved process attachment and updated the corresponding [documentation](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/how-to/using-rocprofv3-process-attachment.html). - Improved [Quick reference guide for rocprofv3] (https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/quick_guide.html). - Updated installation documentation with links to the latest repository (https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/install/installation.html). ### Resolved issues - Fixed multi-GPU dimension mismatch. - Fixed device lock issue for dispatch counters. - Addressed OpenMP Tools task scheduling null pointer exception. - Fixed stream ID errors arising during process attachment. - Fixed issues arising during dynamic code object loading.