Update CHANGELOG.md (#557)
* Update CHANGELOG.md
* some more updates
---------
Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com>
[ROCm/rocprofiler-sdk commit: ca9905d6bb]
Tá an tiomantas seo le fáil i:
tiomanta ag
GitHub
tuismitheoir
6a6b16be93
tiomantas
b2ad20d187
@@ -41,4 +41,6 @@
|
||||
/.github
|
||||
|
||||
# VSCode Workspaces
|
||||
*.code-workspace
|
||||
*.code-workspace
|
||||
rocprofiler-sdk-build/CMakeCache.txt
|
||||
/rocprofiler-sdk-build
|
||||
|
||||
@@ -164,62 +164,62 @@ Full documentation for ROCprofiler-SDK is available at [rocm.docs.amd.com/projec
|
||||
|
||||
### Added
|
||||
|
||||
- Added support for rocJPEG API Tracing
|
||||
- Added MI350X/MI355X support
|
||||
- Added rocprofiler_create_counter to allow for adding custom derived counters at runtime.
|
||||
- Added support for iteration based counter multiplexing to rocprofv3 (see documentation)
|
||||
- Added perfetto support for counter collection.
|
||||
- Added support for negating rocprofv3 tracing options when using aggregate options, e.g. `--sys-trace --hsa-trace=no`
|
||||
- Added `--agent-index` option in rocprofv3 to specify the agent naming convention in the output
|
||||
- Support for [rocJPEG](https://rocm.docs.amd.com/projects/rocJPEG/en/latest/index.html) API Tracing.
|
||||
- Support for AMD Instinct MI350X and MI355X accelerators.
|
||||
- `rocprofiler_create_counter` to facilitate adding custom derived counters at runtime.
|
||||
- Support in `rocprofv3` for iteration based counter multiplexing.
|
||||
- Perfetto support for counter collection.
|
||||
- Support for negating `rocprofv3` tracing options when using aggregate options such as `--sys-trace --hsa-trace=no`.
|
||||
- `--agent-index` option in `rocprofv3` to specify the agent naming convention in the output:
|
||||
- absolute == node_id
|
||||
- relative == logical_node_id
|
||||
- type-relative == logical_node_type_id
|
||||
- Added MI300/MI350 stochastic (hardware-based) PC sampling support in ROCProfiler-SDK and ROCProfV3
|
||||
- Python bindings for rocprofiler-sdk-roctx
|
||||
- SQLite3 output support for rocprofv3 (`--output-format rocpd`)
|
||||
- Added `rocprofiler-sdk-rocpd` package
|
||||
- public API in `include/rocprofiler-sdk-rocpd/rocpd.h`
|
||||
- library implementation in `librocprofiler-sdk-rocpd.so`
|
||||
- support for `find_package(rocprofiler-sdk-rocpd)`
|
||||
- `rocprofiler-sdk-rocpd` DEB and RPM packages
|
||||
- Support `--version` option for `rocprofv3`
|
||||
- Added `rocpd` Python package
|
||||
- Added thread trace as experimental API
|
||||
- Added ROCprof Trace Decoder as experimental API
|
||||
- Requires [ROCprof Trace Decoder plugin](https://github.com/rocm/rocprof-trace-decoder)
|
||||
- Added thread trace option to the rocprofv3 tool under the --att parameters
|
||||
- MI300 and MI350 stochastic (hardware-based) PC sampling support in ROCProfiler-SDK and `rocprofv3`.
|
||||
- Python bindings for `rocprofiler-sdk-roctx`
|
||||
- SQLite3 output support for `rocprofv3` using `--output-format rocpd`.
|
||||
- `rocprofiler-sdk-rocpd` package:
|
||||
- Public API in `include/rocprofiler-sdk-rocpd/rocpd.h`.
|
||||
- Library implementation in `librocprofiler-sdk-rocpd.so`.
|
||||
- Support for `find_package(rocprofiler-sdk-rocpd)`.
|
||||
- `rocprofiler-sdk-rocpd` DEB and RPM packages.
|
||||
- `--version` option in `rocprofv3`.
|
||||
- `rocpd` Python package.
|
||||
- Thread trace as experimental API.
|
||||
- ROCprof Trace Decoder as experimental API:
|
||||
- Requires [ROCprof Trace Decoder plugin](https://github.com/rocm/rocprof-trace-decoder).
|
||||
- Thread trace option in the `rocprofv3` tool under the `--att` parameters:
|
||||
- See [using thread trace with rocprofv3](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/amd-mainline/how-to/using-thread-trace.html)
|
||||
- Requires the ROCprof Trace Decoder plugin installed (see above).
|
||||
- Added `rocpd` output format documentation
|
||||
- Requires the ROCprof Trace Decoder plugin installed (see above)
|
||||
- Added perfetto support for scratch memory.
|
||||
- Added documentation for rocprofv3 advanced options
|
||||
- Requires [ROCprof Trace Decoder plugin](https://github.com/rocm/rocprof-trace-decoder).
|
||||
- `rocpd` output format documentation:
|
||||
- Requires [ROCprof Trace Decoder plugin](https://github.com/rocm/rocprof-trace-decoder).
|
||||
- Perfetto support for scratch memory.
|
||||
- Support in the `rocprofv3` avail tool for command-line arguments.
|
||||
- Documentation for `rocprofv3` advanced options.
|
||||
|
||||
### Changed
|
||||
|
||||
- SDK no longer creates a background thread when every tool returns a nullptr from `rocprofiler_configure`.
|
||||
- Updated disassembly.hpp's vaddr-to-file-offset mapping to use the dedicated comgr API.
|
||||
- rocprofiler_uuid_t ABI is changed to hold 128 bit value.
|
||||
- rocprofv3 shorthand argument for `--collection-period` is now `-P` (upper-case) as `-p` (lower-case) is reserved for later use
|
||||
- default output format for rocprofv3 is now `rocpd` (SQLite3 database)
|
||||
- rocprofv3 avail tool renamed from rocprofv3_avail to rocprofv3-avail tool
|
||||
- rocprofv3 avail tool has support for command line arguments.
|
||||
- rocprofv3 tool now allows for Thread Trace + PC Sampling on the same agent
|
||||
- fixed inconsistency for what is a "null" handle in `rocprofiler_*_id_t` structs.
|
||||
- correct answer is `.handle = 0` but some definitions used `UINT64_MAX`
|
||||
- SDK to NOT to create a background thread when every tool returns a nullptr from `rocprofiler_configure`.
|
||||
- `vaddr-to-file-offset` mapping in `disassembly.hpp` to use the dedicated comgr API.
|
||||
- `rocprofiler_uuid_t` ABI to hold 128 bit value.
|
||||
- `rocprofv3` shorthand argument for `--collection-period` to `-P` (upper-case) while `-p` (lower-case) is reserved for later use.
|
||||
- Default output format for `rocprofv3` to `rocpd` (SQLite3 database).
|
||||
- `rocprofv3` avail tool to be renamed from `rocprofv3_avail` to `rocprofv3-avail` tool.
|
||||
- `rocprofv3` tool to facilitate thread trace and PC sampling on the same agent.
|
||||
|
||||
### Resolved issues
|
||||
|
||||
- Fixed missing callbacks around internal thread creation within counter collection service
|
||||
- Fixed potential data race in rocprofiler-sdk double buffering scheme
|
||||
- Usage of std::regex in core rocprofiler-sdk library which causes segfaults/exceptions when used under dual ABI
|
||||
- Fixed perfetto counter collection by introducing per dispatch accumulation.
|
||||
- Code object disassembly was missing function inlining information
|
||||
- Fixed queue preemption error and HSA_STATUS_ERROR_INVALID_PACKET_FORMAT error for stochastic PC-sampling for MI300X, leading to more stable runs.
|
||||
- Fixed the system hang issue for host-trap PC-sampling on MI300X.
|
||||
- Fixed rocpd counter collection issue when counter collection alone is enabled, rocpd_kernel_dispatch table gets populated by counters data instead of kernel_dispatch data.
|
||||
- Fixed kernel trace csv output generated by rocpd.
|
||||
- Fixed missing callbacks around internal thread creation within counter collection service.
|
||||
- Fixed potential data race in the ROCprofiler-SDK double buffering scheme.
|
||||
- Fixed usage of std::regex in the core ROCprofiler-SDK library that caused segfaults or exceptions when used under dual ABI.
|
||||
- Fixed Perfetto counter collection by introducing accumulation per dispatch.
|
||||
- Fixed code object disassembly for missing function inlining information.
|
||||
- Fixed queue preemption error and `HSA_STATUS_ERROR_INVALID_PACKET_FORMAT` error for stochastic PC-sampling in MI300X, leading to stabler runs.
|
||||
- Fixed the system hang issue for host-trap PC-sampling on MI300X.
|
||||
- Fixed `rocpd` counter collection issue when counter collection alone is enabled. `rocpd_kernel_dispatch` table is updated to be populated by counters data instead of kernel_dispatch data.
|
||||
- Fixed `rocprofiler_*_id_t` structs for inconsistency related to a "null" handle:
|
||||
- The correct definition for a null handle is `.handle = 0` while some definitions previously used `UINT64_MAX`.
|
||||
- Fixed kernel trace csv output generated by `rocpd`.
|
||||
|
||||
### Removed
|
||||
|
||||
- Support of gfx940 and gfx941 targets from compilation
|
||||
- Support for compilation of gfx940 and gfx941 targets.
|
||||
|
||||
Tagairt in Eagrán Nua
Cuir bac ar úsáideoir