Files
rocm-systems/projects/rocprofiler-sdk/CHANGELOG.md
T

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

261 строка
12 KiB
Markdown
Исходник Обычный вид История

2024-05-29 22:31:02 +05:30
# Changelog for ROCprofiler-SDK
2024-11-05 18:11:57 +05:30
Full documentation for ROCprofiler-SDK is available at [rocm.docs.amd.com/projects/rocprofiler-sdk](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/index.html)
2024-05-29 22:31:02 +05:30
## ROCprofiler-SDK for AFAR I
2024-10-30 19:39:08 +05:30
### Added
2024-05-29 22:31:02 +05:30
2024-10-30 19:39:08 +05:30
- HSA API tracing
- Kernel dispatch tracing
- Kernel dispatch counter collection
- Instances reported as single dimension
2024-05-29 22:31:02 +05:30
- No serialization
## ROCprofiler-SDK for AFAR II
2024-10-30 19:39:08 +05:30
### Added
2024-05-29 22:31:02 +05:30
2024-10-30 19:39:08 +05:30
- HIP API tracing
- ROCTx tracing
2024-05-29 22:31:02 +05:30
- Tracing ROCProf Tool V3
2024-10-30 19:39:08 +05:30
- Documentation packaging
- ROCTx control (start and stop)
- Memory copy tracing
2024-05-29 22:31:02 +05:30
## ROCprofiler-SDK for AFAR III
2024-10-30 19:39:08 +05:30
### Added
- Kernel dispatch counter collection. This includes serialization and multidimensional instances.
- Kernel serialization.
- Serialization control (on and off).
- ROCprof tool plugin interface V3 for counters and dimensions.
- Support to list metrics.
- Correlation-Id retirement
- HIP and HSA trace distinction:
- --hip-runtime-trace For collecting HIP Runtime API traces
- --hip-compiler-trace For collecting HIP compiler-generated code traces
- --hsa-core-trace For collecting HSA API traces (core API)
- --hsa-amd-trace For collecting HSA API traces (AMD-extension API)
- --hsa-image-trace For collecting HSA API traces (image-extension API)
- --hsa-finalizer-trace For collecting HSA API traces (finalizer-extension API)
2024-05-29 22:31:02 +05:30
## ROCprofiler-SDK for AFAR IV
2024-10-30 19:39:08 +05:30
### Added
2024-05-29 22:31:02 +05:30
2024-10-30 19:39:08 +05:30
**API:**
- Page migration reporting
- Scratch memory reporting
- Kernel dispatch callback tracing
- External correlation Id request service
- Buffered counter collection record headers
- Option to remove HSA dependency from counter collection
**Tool:**
- `rocprofv3` multi-GPU support in a single-process
2024-05-29 22:31:02 +05:30
## ROCprofiler-SDK for AFAR V
2024-10-30 19:39:08 +05:30
### Added
**API:**
2024-05-29 22:31:02 +05:30
2024-10-30 19:39:08 +05:30
- Agent or device counter collection
- PC sampling (beta)
2024-05-29 22:31:02 +05:30
2024-10-30 19:39:08 +05:30
**Tool:**
2024-05-29 22:31:02 +05:30
2024-10-30 19:39:08 +05:30
- Single JSON output format support
- Perfetto output format support (.pftrace)
- Input YAML support for counter collection
- Input JSON support for counter collection
- Application replay in counter collection
- `rocprofv3` multi-GPU support:
- Multiprocess (multiple files)
2024-05-29 22:31:02 +05:30
2024-10-30 19:39:08 +05:30
### Changed
2024-05-29 22:31:02 +05:30
2024-10-30 19:39:08 +05:30
- `rocprofv3` tool now requires mentioning `--` before the application. For detailed use, see [Using rocprofv3](source/docs/how-to/using-rocprofv3.rst)
2024-09-10 17:08:57 -05:00
2024-10-30 19:39:08 +05:30
### Resolved issues
2024-07-31 21:46:37 +05:30
2024-10-30 19:39:08 +05:30
- Fixed `SQ_ACCUM_PREV` and `SQ_ACCUM_PREV_HIRE` overwriting issue
2024-07-31 21:46:37 +05:30
2024-10-30 19:39:08 +05:30
## ROCprofiler-SDK 0.4.0 for ROCm release 6.2 (AFAR VI)
2024-07-31 21:46:37 +05:30
2024-10-30 19:39:08 +05:30
### Added
2024-07-31 21:46:37 +05:30
2024-10-30 19:39:08 +05:30
- OTF2 tool support
- Kernel and range filtering
- Counter collection definitions in YAML
- Documentation updates (SQ block, counter collection, tracing, tool usage)
- `rocprofv3` option `--kernel-rename`
- `rocprofv3` options for Perfetto settings (buffer size and so on)
- CSV columns for kernel trace
- `Thread_Id`
- `Dispatch_Id`
- CSV column for counter collection
2024-09-10 17:08:57 -05:00
2025-01-23 11:42:12 +05:30
## ROCprofiler-SDK 0.5.0 for ROCm release 6.3 (AFAR VII)
2024-09-10 17:08:57 -05:00
2024-10-30 19:39:08 +05:30
### Added
2024-09-10 17:08:57 -05:00
2024-10-30 19:39:08 +05:30
- Start and end timestamp columns to the counter collection csv output
- Check to force tools to initialize context id with zero
2024-11-05 18:11:57 +05:30
- Support to specify hardware counters for collection using rocprofv3 as `rocprofv3 --pmc [COUNTER [COUNTER ...]]`
2024-11-18 20:22:14 -06:00
- Memory Allocation Tracing
2024-12-04 18:32:48 -06:00
- PC sampling tool support with CSV and JSON output formats
2025-01-23 11:42:12 +05:30
- List supported PC Sampling Configurations
2024-10-30 19:39:08 +05:30
### Changed
- `--marker-trace` option for `rocprofv3` now supports the legacy ROCTx library `libroctx64.so` when the application is linked against the new library `librocprofiler-sdk-roctx.so`.
- Replaced deprecated `hipHostMalloc` and `hipHostFree` functions with `hipExtHostAlloc` and `hipFreeHost` for ROCm versions starting 6.3.
2024-10-05 12:40:27 +05:30
- Updated `rocprofv3` `--help` options.
2024-10-30 19:39:08 +05:30
- Changed naming of "agent profiling" to a more descriptive "device counting service". To convert existing tool or user code to the new name, use the following sed:
`find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +`
- Changed naming of "dispatch profiling service" to a more descriptive "dispatch counting service". To convert existing tool or user code to the new names, the following sed can be used: `-type f -exec sed -i -e 's/dispatch_profile_counting_service/dispatch_counting_service/g' -e 's/dispatch_profile.h/dispatch_counting_service.h/g' -e 's/rocprofiler_profile_counting_dispatch_callback_t/rocprofiler_dispatch_counting_service_callback_t/g' -e 's/rocprofiler_profile_counting_dispatch_data_t/rocprofiler_dispatch_counting_service_data_t/g' -e 's/rocprofiler_profile_counting_dispatch_record_t/rocprofiler_dispatch_counting_service_record_t/g' {} +`
- `FETCH_SIZE` metric on gfx94x now uses `TCC_BUBBLE` for 128B reads.
- PMC dispatch-based counter collection serialization is now per-device instead of being global across all devices.
- Added output return functionality to rocprofiler_sample_device_counting_service
- Added rocprofiler_load_counter_definition.
2024-10-30 19:39:08 +05:30
### Resolved issues
2024-11-11 11:14:59 -06:00
- Create subdirectory when `rocprofv3 --output-file` includes a folder path
2024-10-30 19:39:08 +05:30
- Fixed misaligned stores (undefined behavior) for buffer records
- Fixed crash when only scratch reporting is enabled
- Fixed `MeanOccupancy` metrics
- Fixed aborted-application validation test to properly check for `hipExtHostAlloc` command
- Fixed implicit reduction of SQ and GRBM metrics
- Fixed support for derived counters in reduce operation
- Bug fixed in max-in-reduce operation
- Introduced fix to handle a range of values for `select()` dimension in expressions parser
2024-11-11 11:14:59 -06:00
- Conditional `aql::set_profiler_active_on_queue` only when counter collection is registered (resolves Navi3 kernel tracing issues)
2024-10-03 13:44:31 -03:00
### Removed
2024-10-30 19:39:08 +05:30
- Removed gfx8 metric definitions
- Removed `rocprofv3` installation to sbin directory
2025-01-23 11:42:12 +05:30
## ROCprofiler-SDK 0.6.0 for ROCm release 6.4
### Added
- Support for `select()` operation in counter expression.
- `reduce()` operation for counter expression with respect to dimension.
- `--collection-period` feature in `rocprofv3` to enable filtering using time.
- `--collection-period-unit` feature in `rocprofv3` to control time units used in collection period option.
- Deprecation notice for ROCProfiler and ROCProfilerV2.
- Support for rocDecode API Tracing
- Usage documentation for ROCTx
- Usage documentation for MPI applications
- SDK: `rocprofiler_agent_v0_t` support for agent UUIDs
- SDK: `rocprofiler_agent_v0_t` support for agent visibility based on gpu isolation environment variables such as `ROCR_VISIBLE_DEVICES` and so on.
- Accumulation VGPR support for `rocprofv3`.
2025-04-16 02:00:07 +05:30
- Host-trap based PC sampling support for rocprofv3.
- Support for OpenMP tool.
## ROCprofiler-SDK 1.0.0 for ROCm release 7.0
2025-02-21 15:43:49 -06:00
### Added
2025-08-05 20:59:42 +05:30
- Support for [rocJPEG](https://rocm.docs.amd.com/projects/rocJPEG/en/latest/index.html) API Tracing.
- Support for AMD Instinct MI350X and MI355X accelerators.
- `rocprofiler_create_counter` to facilitate adding custom derived counters at runtime.
- Support in `rocprofv3` for iteration based counter multiplexing.
- Perfetto support for counter collection.
- Support for negating `rocprofv3` tracing options when using aggregate options such as `--sys-trace --hsa-trace=no`.
- `--agent-index` option in `rocprofv3` to specify the agent naming convention in the output:
- absolute == node_id
- relative == logical_node_id
- type-relative == logical_node_type_id
2025-08-05 20:59:42 +05:30
- MI300 and MI350 stochastic (hardware-based) PC sampling support in ROCProfiler-SDK and `rocprofv3`.
- Python bindings for `rocprofiler-sdk-roctx`
- SQLite3 output support for `rocprofv3` using `--output-format rocpd`.
- `rocprofiler-sdk-rocpd` package:
- Public API in `include/rocprofiler-sdk-rocpd/rocpd.h`.
- Library implementation in `librocprofiler-sdk-rocpd.so`.
- Support for `find_package(rocprofiler-sdk-rocpd)`.
- `rocprofiler-sdk-rocpd` DEB and RPM packages.
- `--version` option in `rocprofv3`.
- `rocpd` Python package.
- Thread trace as experimental API.
- ROCprof Trace Decoder as experimental API:
- Requires [ROCprof Trace Decoder plugin](https://github.com/rocm/rocprof-trace-decoder).
- Thread trace option in the `rocprofv3` tool under the `--att` parameters:
- See [using thread trace with rocprofv3](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/amd-mainline/how-to/using-thread-trace.html)
2025-08-05 20:59:42 +05:30
- Requires [ROCprof Trace Decoder plugin](https://github.com/rocm/rocprof-trace-decoder).
- `rocpd` output format documentation:
- Requires [ROCprof Trace Decoder plugin](https://github.com/rocm/rocprof-trace-decoder).
- Perfetto support for scratch memory.
- Support in the `rocprofv3` avail tool for command-line arguments.
- Documentation for `rocprofv3` advanced options.
- Support for multi dispatch ATT file added
### Changed
2025-08-05 20:59:42 +05:30
- SDK to NOT to create a background thread when every tool returns a nullptr from `rocprofiler_configure`.
- `vaddr-to-file-offset` mapping in `disassembly.hpp` to use the dedicated comgr API.
- `rocprofiler_uuid_t` ABI to hold 128 bit value.
- `rocprofv3` shorthand argument for `--collection-period` to `-P` (upper-case) while `-p` (lower-case) is reserved for later use.
- Default output format for `rocprofv3` to `rocpd` (SQLite3 database).
- `rocprofv3` avail tool to be renamed from `rocprofv3_avail` to `rocprofv3-avail` tool.
- `rocprofv3` tool to facilitate thread trace and PC sampling on the same agent.
### Resolved issues
2025-08-05 20:59:42 +05:30
- Fixed missing callbacks around internal thread creation within counter collection service.
- Fixed potential data race in the ROCprofiler-SDK double buffering scheme.
- Fixed usage of std::regex in the core ROCprofiler-SDK library that caused segfaults or exceptions when used under dual ABI.
- Fixed Perfetto counter collection by introducing accumulation per dispatch.
- Fixed code object disassembly for missing function inlining information.
- Fixed queue preemption error and `HSA_STATUS_ERROR_INVALID_PACKET_FORMAT` error for stochastic PC-sampling in MI300X, leading to stabler runs.
- Fixed the system hang issue for host-trap PC-sampling on MI300X.
- Fixed `rocpd` counter collection issue when counter collection alone is enabled. `rocpd_kernel_dispatch` table is updated to be populated by counters data instead of kernel_dispatch data.
- Fixed `rocprofiler_*_id_t` structs for inconsistency related to a "null" handle:
- The correct definition for a null handle is `.handle = 0` while some definitions previously used `UINT64_MAX`.
- Fixed kernel trace csv output generated by `rocpd`.
### Removed
2025-02-21 15:43:49 -06:00
2025-08-05 20:59:42 +05:30
- Support for compilation of gfx940 and gfx941 targets.
## ROCprofiler-SDK 1.1.0 for ROCm release 7.1
### Added
- Dynamic process attachment- ROCprofiler-sdk and `rocprofv3` now facilitate dynamic profiling of a running GPU applications by attaching to its process ID (PID), rather than launching the application through the profiler itself.
- Scratch-memory trace information to the Perfetto output in `rocprofv3`.
- New capabilities to the thread trace support in `rocprofv3`, including real-time clock support for thread trace alignment on gfx9 architecture. This enables high-resolution clock computation and better synchronization across shader engines. Additionally, `MultiKernelDispatch` thread trace support is now available across all ASICs.
- Documentation for dynamic process attachment.
- Documentation for `rocpd` summaries.
### Optimized
- Improved the stability and robustness of the `rocpd` output.
## ROCprofiler-SDK 1.1.0 for ROCm release 7.2
### Added
- Counter collection support for `gfx1150` and `gfx1151`.
- HSA Extension API v8 support.
- `hipStreamCopyAttributes` API implementation.
### Optimized
- Improved process attachment and updated the corresponding [documentation](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/how-to/using-rocprofv3-process-attachment.html).
- Improved [Quick reference guide for rocprofv3] (https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/quick_guide.html).
- Updated installation documentation with links to the latest repository (https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/install/installation.html).
### Resolved issues
- Fixed multi-GPU dimension mismatch.
- Fixed device lock issue for dispatch counters.
- Addressed OpenMP Tools task scheduling null pointer exception.
- Fixed stream ID errors arising during process attachment.
- Fixed issues arising during dynamic code object loading.