1ac805cb35
* Updating CHANGELOG for 7.2 * Updated CHANGELOG * Addressed feedback * Addressed Feedback * Updated based on review comments * Update installation steps and documentation links Updated installation documentation and links to latest repository. * Addressed Feedback * Updated CHANGELOG * Addressed feedback * updated CHANGELOG * Apply suggestion from @prbasyal-amd Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> --------- Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com> Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
12 KiB
12 KiB
Changelog for ROCprofiler-SDK
Full documentation for ROCprofiler-SDK is available at rocm.docs.amd.com/projects/rocprofiler-sdk
ROCprofiler-SDK for AFAR I
Added
- HSA API tracing
- Kernel dispatch tracing
- Kernel dispatch counter collection
- Instances reported as single dimension
- No serialization
ROCprofiler-SDK for AFAR II
Added
- HIP API tracing
- ROCTx tracing
- Tracing ROCProf Tool V3
- Documentation packaging
- ROCTx control (start and stop)
- Memory copy tracing
ROCprofiler-SDK for AFAR III
Added
- Kernel dispatch counter collection. This includes serialization and multidimensional instances.
- Kernel serialization.
- Serialization control (on and off).
- ROCprof tool plugin interface V3 for counters and dimensions.
- Support to list metrics.
- Correlation-Id retirement
- HIP and HSA trace distinction:
- --hip-runtime-trace For collecting HIP Runtime API traces
- --hip-compiler-trace For collecting HIP compiler-generated code traces
- --hsa-core-trace For collecting HSA API traces (core API)
- --hsa-amd-trace For collecting HSA API traces (AMD-extension API)
- --hsa-image-trace For collecting HSA API traces (image-extension API)
- --hsa-finalizer-trace For collecting HSA API traces (finalizer-extension API)
ROCprofiler-SDK for AFAR IV
Added
API:
- Page migration reporting
- Scratch memory reporting
- Kernel dispatch callback tracing
- External correlation Id request service
- Buffered counter collection record headers
- Option to remove HSA dependency from counter collection
Tool:
rocprofv3multi-GPU support in a single-process
ROCprofiler-SDK for AFAR V
Added
API:
- Agent or device counter collection
- PC sampling (beta)
Tool:
- Single JSON output format support
- Perfetto output format support (.pftrace)
- Input YAML support for counter collection
- Input JSON support for counter collection
- Application replay in counter collection
rocprofv3multi-GPU support:- Multiprocess (multiple files)
Changed
rocprofv3tool now requires mentioning--before the application. For detailed use, see Using rocprofv3
Resolved issues
- Fixed
SQ_ACCUM_PREVandSQ_ACCUM_PREV_HIREoverwriting issue
ROCprofiler-SDK 0.4.0 for ROCm release 6.2 (AFAR VI)
Added
- OTF2 tool support
- Kernel and range filtering
- Counter collection definitions in YAML
- Documentation updates (SQ block, counter collection, tracing, tool usage)
rocprofv3option--kernel-renamerocprofv3options for Perfetto settings (buffer size and so on)- CSV columns for kernel trace
Thread_IdDispatch_Id
- CSV column for counter collection
ROCprofiler-SDK 0.5.0 for ROCm release 6.3 (AFAR VII)
Added
- Start and end timestamp columns to the counter collection csv output
- Check to force tools to initialize context id with zero
- Support to specify hardware counters for collection using rocprofv3 as
rocprofv3 --pmc [COUNTER [COUNTER ...]] - Memory Allocation Tracing
- PC sampling tool support with CSV and JSON output formats
- List supported PC Sampling Configurations
Changed
--marker-traceoption forrocprofv3now supports the legacy ROCTx librarylibroctx64.sowhen the application is linked against the new librarylibrocprofiler-sdk-roctx.so.- Replaced deprecated
hipHostMallocandhipHostFreefunctions withhipExtHostAllocandhipFreeHostfor ROCm versions starting 6.3. - Updated
rocprofv3--helpoptions. - Changed naming of "agent profiling" to a more descriptive "device counting service". To convert existing tool or user code to the new name, use the following sed:
find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} + - Changed naming of "dispatch profiling service" to a more descriptive "dispatch counting service". To convert existing tool or user code to the new names, the following sed can be used:
-type f -exec sed -i -e 's/dispatch_profile_counting_service/dispatch_counting_service/g' -e 's/dispatch_profile.h/dispatch_counting_service.h/g' -e 's/rocprofiler_profile_counting_dispatch_callback_t/rocprofiler_dispatch_counting_service_callback_t/g' -e 's/rocprofiler_profile_counting_dispatch_data_t/rocprofiler_dispatch_counting_service_data_t/g' -e 's/rocprofiler_profile_counting_dispatch_record_t/rocprofiler_dispatch_counting_service_record_t/g' {} + FETCH_SIZEmetric on gfx94x now usesTCC_BUBBLEfor 128B reads.- PMC dispatch-based counter collection serialization is now per-device instead of being global across all devices.
- Added output return functionality to rocprofiler_sample_device_counting_service
- Added rocprofiler_load_counter_definition.
Resolved issues
- Create subdirectory when
rocprofv3 --output-fileincludes a folder path - Fixed misaligned stores (undefined behavior) for buffer records
- Fixed crash when only scratch reporting is enabled
- Fixed
MeanOccupancymetrics - Fixed aborted-application validation test to properly check for
hipExtHostAlloccommand - Fixed implicit reduction of SQ and GRBM metrics
- Fixed support for derived counters in reduce operation
- Bug fixed in max-in-reduce operation
- Introduced fix to handle a range of values for
select()dimension in expressions parser - Conditional
aql::set_profiler_active_on_queueonly when counter collection is registered (resolves Navi3 kernel tracing issues)
Removed
- Removed gfx8 metric definitions
- Removed
rocprofv3installation to sbin directory
ROCprofiler-SDK 0.6.0 for ROCm release 6.4
Added
- Support for
select()operation in counter expression. reduce()operation for counter expression with respect to dimension.--collection-periodfeature inrocprofv3to enable filtering using time.--collection-period-unitfeature inrocprofv3to control time units used in collection period option.- Deprecation notice for ROCProfiler and ROCProfilerV2.
- Support for rocDecode API Tracing
- Usage documentation for ROCTx
- Usage documentation for MPI applications
- SDK:
rocprofiler_agent_v0_tsupport for agent UUIDs - SDK:
rocprofiler_agent_v0_tsupport for agent visibility based on gpu isolation environment variables such asROCR_VISIBLE_DEVICESand so on. - Accumulation VGPR support for
rocprofv3. - Host-trap based PC sampling support for rocprofv3.
- Support for OpenMP tool.
ROCprofiler-SDK 1.0.0 for ROCm release 7.0
Added
- Support for rocJPEG API Tracing.
- Support for AMD Instinct MI350X and MI355X accelerators.
rocprofiler_create_counterto facilitate adding custom derived counters at runtime.- Support in
rocprofv3for iteration based counter multiplexing. - Perfetto support for counter collection.
- Support for negating
rocprofv3tracing options when using aggregate options such as--sys-trace --hsa-trace=no. --agent-indexoption inrocprofv3to specify the agent naming convention in the output:- absolute == node_id
- relative == logical_node_id
- type-relative == logical_node_type_id
- MI300 and MI350 stochastic (hardware-based) PC sampling support in ROCProfiler-SDK and
rocprofv3. - Python bindings for
rocprofiler-sdk-roctx - SQLite3 output support for
rocprofv3using--output-format rocpd. rocprofiler-sdk-rocpdpackage:- Public API in
include/rocprofiler-sdk-rocpd/rocpd.h. - Library implementation in
librocprofiler-sdk-rocpd.so. - Support for
find_package(rocprofiler-sdk-rocpd). rocprofiler-sdk-rocpdDEB and RPM packages.
- Public API in
--versionoption inrocprofv3.rocpdPython package.- Thread trace as experimental API.
- ROCprof Trace Decoder as experimental API:
- Requires ROCprof Trace Decoder plugin.
- Thread trace option in the
rocprofv3tool under the--attparameters: rocpdoutput format documentation:- Requires ROCprof Trace Decoder plugin.
- Perfetto support for scratch memory.
- Support in the
rocprofv3avail tool for command-line arguments. - Documentation for
rocprofv3advanced options. - Support for multi dispatch ATT file added
Changed
- SDK to NOT to create a background thread when every tool returns a nullptr from
rocprofiler_configure. vaddr-to-file-offsetmapping indisassembly.hppto use the dedicated comgr API.rocprofiler_uuid_tABI to hold 128 bit value.rocprofv3shorthand argument for--collection-periodto-P(upper-case) while-p(lower-case) is reserved for later use.- Default output format for
rocprofv3torocpd(SQLite3 database). rocprofv3avail tool to be renamed fromrocprofv3_availtorocprofv3-availtool.rocprofv3tool to facilitate thread trace and PC sampling on the same agent.
Resolved issues
- Fixed missing callbacks around internal thread creation within counter collection service.
- Fixed potential data race in the ROCprofiler-SDK double buffering scheme.
- Fixed usage of std::regex in the core ROCprofiler-SDK library that caused segfaults or exceptions when used under dual ABI.
- Fixed Perfetto counter collection by introducing accumulation per dispatch.
- Fixed code object disassembly for missing function inlining information.
- Fixed queue preemption error and
HSA_STATUS_ERROR_INVALID_PACKET_FORMATerror for stochastic PC-sampling in MI300X, leading to stabler runs. - Fixed the system hang issue for host-trap PC-sampling on MI300X.
- Fixed
rocpdcounter collection issue when counter collection alone is enabled.rocpd_kernel_dispatchtable is updated to be populated by counters data instead of kernel_dispatch data. - Fixed
rocprofiler_*_id_tstructs for inconsistency related to a "null" handle:- The correct definition for a null handle is
.handle = 0while some definitions previously usedUINT64_MAX.
- The correct definition for a null handle is
- Fixed kernel trace csv output generated by
rocpd.
Removed
- Support for compilation of gfx940 and gfx941 targets.
ROCprofiler-SDK 1.1.0 for ROCm release 7.1
Added
- Dynamic process attachment- ROCprofiler-sdk and
rocprofv3now facilitate dynamic profiling of a running GPU applications by attaching to its process ID (PID), rather than launching the application through the profiler itself. - Scratch-memory trace information to the Perfetto output in
rocprofv3. - New capabilities to the thread trace support in
rocprofv3, including real-time clock support for thread trace alignment on gfx9 architecture. This enables high-resolution clock computation and better synchronization across shader engines. Additionally,MultiKernelDispatchthread trace support is now available across all ASICs. - Documentation for dynamic process attachment.
- Documentation for
rocpdsummaries.
Optimized
- Improved the stability and robustness of the
rocpdoutput.
ROCprofiler-SDK 1.1.0 for ROCm release 7.2
Added
- Counter collection support for
gfx1150andgfx1151(Strix Halo). - HSA Extension API v8 support.
hipStreamCopyAttributesAPI implementation.
Optimized
- Improved process attachment and updated the corresponding documentation.
- Improved [Quick reference guide for rocprofv3] (https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/quick_guide.html).
- Updated installation documentation with links to the latest repository (https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/install/installation.html).
Resolved issues
- Fixed multi-GPU dimension mismatch.
- Fixed device lock issue for dispatch counters.
- Addressed OpenMP Tools task scheduling null pointer exception.
- Fixed stream ID errors arising during process attachment.
- Fixed issues arising during dynamic code object loading.