007285272b
* [SWDEV-518071] Return HSA not loaded status (device counter collection) This is a state that a caller would want to know about to understand if they got no counters because of a failure or if they were trying to collect counters too early (as is the case in the sample, which can attempt to collect counters before HSA is inited). * Minor edit * format * [SWDEV-518081] Simplify Metric Loading (#243) * [SWDEV-518071] Return HSA not loaded status (device counter collection) This is a state that a caller would want to know about to understand if they got no counters because of a failure or if they were trying to collect counters too early (as is the case in the sample, which can attempt to collect counters before HSA is inited). * [SWDEV-518324] Add AST update support Allows the ability for ASTs to be updated (instead of an unchangable static value). Adds a shared pointer return type to protect against static destructors/modifications from invalidating potentially in use AST definitions. No functionality/use changes in this PR. * [SWDEV-518593] Add updatable dimension cache + fix string issues (#252) * [SWDEV-518593] Add updatable dimension cache + fix string issues Updates dimension cache to use the same design pattern as AST/Metrics. Fixes the string scoping issue seen in ASTs, which appears here as well. * Add rocprofiler_create_counter Creates derived counters based on input from the API. This PR does three things: 1. Adds the API + test case 2. Validates that an AST can be constructed from the counter supplied. 3. Updates metrics, ast, and dimension caches to include the new metric. Metric should be available for use immediately after the call completes. Due to the regeneration of ASTs, this call should not be performed in performance sensitive code. * Suggestion fixes --------- Co-authored-by: Benjamin Welton <bewelton@amd.com> * Minor tweak --------- Co-authored-by: Benjamin Welton <bewelton@amd.com> Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com> --------- Co-authored-by: Benjamin Welton <bewelton@amd.com> Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com> * Fixes for comments --------- Co-authored-by: Benjamin Welton <bewelton@amd.com> Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com> Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com> --------- Co-authored-by: Benjamin Welton <bewelton@amd.com> Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com> Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
7.5 KiB
7.5 KiB
Changelog for ROCprofiler-SDK
Full documentation for ROCprofiler-SDK is available at rocm.docs.amd.com/projects/rocprofiler-sdk
ROCprofiler-SDK for AFAR I
Added
- HSA API tracing
- Kernel dispatch tracing
- Kernel dispatch counter collection
- Instances reported as single dimension
- No serialization
ROCprofiler-SDK for AFAR II
Added
- HIP API tracing
- ROCTx tracing
- Tracing ROCProf Tool V3
- Documentation packaging
- ROCTx control (start and stop)
- Memory copy tracing
ROCprofiler-SDK for AFAR III
Added
- Kernel dispatch counter collection. This includes serialization and multidimensional instances.
- Kernel serialization.
- Serialization control (on and off).
- ROCprof tool plugin interface V3 for counters and dimensions.
- Support to list metrics.
- Correlation-Id retirement
- HIP and HSA trace distinction:
- --hip-runtime-trace For collecting HIP Runtime API traces
- --hip-compiler-trace For collecting HIP compiler-generated code traces
- --hsa-core-trace For collecting HSA API traces (core API)
- --hsa-amd-trace For collecting HSA API traces (AMD-extension API)
- --hsa-image-trace For collecting HSA API traces (image-extension API)
- --hsa-finalizer-trace For collecting HSA API traces (finalizer-extension API)
ROCprofiler-SDK for AFAR IV
Added
API:
- Page migration reporting
- Scratch memory reporting
- Kernel dispatch callback tracing
- External correlation Id request service
- Buffered counter collection record headers
- Option to remove HSA dependency from counter collection
Tool:
rocprofv3multi-GPU support in a single-process
ROCprofiler-SDK for AFAR V
Added
API:
- Agent or device counter collection
- PC sampling (beta)
Tool:
- Single JSON output format support
- Perfetto output format support (.pftrace)
- Input YAML support for counter collection
- Input JSON support for counter collection
- Application replay in counter collection
rocprofv3multi-GPU support:- Multiprocess (multiple files)
Changed
rocprofv3tool now requires mentioning--before the application. For detailed use, see Using rocprofv3
Resolved issues
- Fixed
SQ_ACCUM_PREVandSQ_ACCUM_PREV_HIREoverwriting issue
ROCprofiler-SDK 0.4.0 for ROCm release 6.2 (AFAR VI)
Added
- OTF2 tool support
- Kernel and range filtering
- Counter collection definitions in YAML
- Documentation updates (SQ block, counter collection, tracing, tool usage)
rocprofv3option--kernel-renamerocprofv3options for Perfetto settings (buffer size and so on)- CSV columns for kernel trace
Thread_IdDispatch_Id
- CSV column for counter collection
ROCprofiler-SDK 0.5.0 for ROCm release 6.3 (AFAR VII)
Added
- Start and end timestamp columns to the counter collection csv output
- Check to force tools to initialize context id with zero
- Support to specify hardware counters for collection using rocprofv3 as
rocprofv3 --pmc [COUNTER [COUNTER ...]] - Memory Allocation Tracing
- PC sampling tool support with CSV and JSON output formats
- List supported PC Sampling Configurations
Changed
--marker-traceoption forrocprofv3now supports the legacy ROCTx librarylibroctx64.sowhen the application is linked against the new librarylibrocprofiler-sdk-roctx.so.- Replaced deprecated
hipHostMallocandhipHostFreefunctions withhipExtHostAllocandhipFreeHostfor ROCm versions starting 6.3. - Updated
rocprofv3--helpoptions. - Changed naming of "agent profiling" to a more descriptive "device counting service". To convert existing tool or user code to the new name, use the following sed:
find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} + - Changed naming of "dispatch profiling service" to a more descriptive "dispatch counting service". To convert existing tool or user code to the new names, the following sed can be used:
-type f -exec sed -i -e 's/dispatch_profile_counting_service/dispatch_counting_service/g' -e 's/dispatch_profile.h/dispatch_counting_service.h/g' -e 's/rocprofiler_profile_counting_dispatch_callback_t/rocprofiler_dispatch_counting_service_callback_t/g' -e 's/rocprofiler_profile_counting_dispatch_data_t/rocprofiler_dispatch_counting_service_data_t/g' -e 's/rocprofiler_profile_counting_dispatch_record_t/rocprofiler_dispatch_counting_service_record_t/g' {} + FETCH_SIZEmetric on gfx94x now usesTCC_BUBBLEfor 128B reads.- PMC dispatch-based counter collection serialization is now per-device instead of being global across all devices.
- Added output return functionality to rocprofiler_sample_device_counting_service
- Added rocprofiler_load_counter_definition.
Resolved issues
- Create subdirectory when
rocprofv3 --output-fileincludes a folder path - Fixed misaligned stores (undefined behavior) for buffer records
- Fixed crash when only scratch reporting is enabled
- Fixed
MeanOccupancymetrics - Fixed aborted-application validation test to properly check for
hipExtHostAlloccommand - Fixed implicit reduction of SQ and GRBM metrics
- Fixed support for derived counters in reduce operation
- Bug fixed in max-in-reduce operation
- Introduced fix to handle a range of values for
select()dimension in expressions parser - Conditional
aql::set_profiler_active_on_queueonly when counter collection is registered (resolves Navi3 kernel tracing issues)
Removed
- Removed gfx8 metric definitions
- Removed
rocprofv3installation to sbin directory
ROCprofiler-SDK 0.6.0 for ROCm release 6.4
Added
- Support for
select()operation in counter expression. reduce()operation for counter expression with respect to dimension.--collection-periodfeature inrocprofv3to enable filtering using time.--collection-period-unitfeature inrocprofv3to control time units used in collection period option.- Deprecation notice for ROCProfiler and ROCProfilerV2.
- Support for rocDecode API Tracing
- Usage documentation for ROCTx
- Usage documentation for MPI applications
- SDK:
rocprofiler_agent_v0_tsupport for agent UUIDs - SDK:
rocprofiler_agent_v0_tsupport for agent visibility based on gpu isolation environment variables such asROCR_VISIBLE_DEVICESand so on. - Accumulation VGPR support for
rocprofv3. - Added
--agent-indexoption in rocprofv3 to specify the agent naming convention in the output- absolute == node_id
- relative == logical_node_id
- type-relative == logical_node_type_id
ROCprofiler-SDK 0.7.0 for ROCm release 6.5
Added
- Added support for rocJPEG API Tracing
- Added rocprofiler_create_counter to allow for adding custom derived counters at runtime.
Changed
- SDK no longer creates a background thread when every tool returns a nullptr from
rocprofiler_configure. - Updated disassembly.hpp's vaddr-to-file-offset mapping to use the dedicated comgr API.
Resolved issues
- Fixed missing callbacks around internal thread creation within counter collection service
Removed
ROCprofiler-SDK 0.7.0 for ROCm release 6.5
Added
- Added support for rocJPEG API Tracing.