[rocprofv3] Use -P for collection period shorthand option (#356)
* [rocprofv3] Use -P for collection period option - Reserve -p for profiler attachment * Update changelog --------- Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
This commit is contained in:
+1
-1
@@ -177,11 +177,11 @@ Full documentation for ROCprofiler-SDK is available at [rocm.docs.amd.com/projec
|
|||||||
- type-relative == logical_node_type_id
|
- type-relative == logical_node_type_id
|
||||||
- Added MI300 stochastic (hardware-based) PC sampling support in ROCProfiler-SDK and ROCProfV3
|
- Added MI300 stochastic (hardware-based) PC sampling support in ROCProfiler-SDK and ROCProfV3
|
||||||
|
|
||||||
|
|
||||||
### Changed
|
### Changed
|
||||||
|
|
||||||
- SDK no longer creates a background thread when every tool returns a nullptr from `rocprofiler_configure`.
|
- SDK no longer creates a background thread when every tool returns a nullptr from `rocprofiler_configure`.
|
||||||
- Updated disassembly.hpp's vaddr-to-file-offset mapping to use the dedicated comgr API.
|
- Updated disassembly.hpp's vaddr-to-file-offset mapping to use the dedicated comgr API.
|
||||||
|
- rocprofv3 shorthand argument for `--collection-period` is now `-P` (upper-case) as `-p` (lower-case) is reserved for later use
|
||||||
|
|
||||||
### Resolved issues
|
### Resolved issues
|
||||||
|
|
||||||
|
|||||||
+20
-2
@@ -200,6 +200,7 @@ For MPI applications (or other job launchers such as SLURM), place rocprofv3 ins
|
|||||||
description="ROCProfilerV3 Run Script",
|
description="ROCProfilerV3 Run Script",
|
||||||
usage="%(prog)s [options] -- <application> [application options]",
|
usage="%(prog)s [options] -- <application> [application options]",
|
||||||
epilog=usage_examples,
|
epilog=usage_examples,
|
||||||
|
allow_abbrev=False,
|
||||||
formatter_class=format_help(argparse.RawTextHelpFormatter),
|
formatter_class=format_help(argparse.RawTextHelpFormatter),
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -501,7 +502,7 @@ For MPI applications (or other job launchers such as SLURM), place rocprofv3 ins
|
|||||||
type=str,
|
type=str,
|
||||||
)
|
)
|
||||||
filter_options.add_argument(
|
filter_options.add_argument(
|
||||||
"-p",
|
"-P",
|
||||||
"--collection-period",
|
"--collection-period",
|
||||||
help="The times are specified in seconds by default, but the unit can be changed using the `--collection-period-unit` option. Start Delay Time is the time in seconds before the collection begins, Collection Time is the duration in seconds for which data is collected, and Rate is the number of times the cycle is repeated. A repeat of 0 indicates that the cycle will repeat indefinitely. Users can specify multiple configurations, each defined by a triplet in the format `start_delay:collection_time:repeat`",
|
help="The times are specified in seconds by default, but the unit can be changed using the `--collection-period-unit` option. Start Delay Time is the time in seconds before the collection begins, Collection Time is the duration in seconds for which data is collected, and Rate is the number of times the cycle is repeated. A repeat of 0 indicates that the cycle will repeat indefinitely. Users can specify multiple configurations, each defined by a triplet in the format `start_delay:collection_time:repeat`",
|
||||||
nargs="+",
|
nargs="+",
|
||||||
@@ -511,7 +512,7 @@ For MPI applications (or other job launchers such as SLURM), place rocprofv3 ins
|
|||||||
)
|
)
|
||||||
filter_options.add_argument(
|
filter_options.add_argument(
|
||||||
"--collection-period-unit",
|
"--collection-period-unit",
|
||||||
help="To change the unit used in `--collection-period` or `-p`, you can specify the desired unit using the `--collection-period-unit` option. The available units are `hour` for hours, `min` for minutes, `sec` for seconds, `msec` for milliseconds, `usec` for microseconds, and `nsec` for nanoseconds",
|
help="To change the unit used in `--collection-period` or `-P`, you can specify the desired unit using the `--collection-period-unit` option. The available units are `hour` for hours, `min` for minutes, `sec` for seconds, `msec` for milliseconds, `usec` for microseconds, and `nsec` for nanoseconds",
|
||||||
nargs=1,
|
nargs=1,
|
||||||
default=["sec"],
|
default=["sec"],
|
||||||
type=str,
|
type=str,
|
||||||
@@ -640,6 +641,16 @@ For MPI applications (or other job launchers such as SLURM), place rocprofv3 ins
|
|||||||
metavar="KB",
|
metavar="KB",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
reserved_options = parser.add_argument_group("Reserved options")
|
||||||
|
reserved_options.add_argument(
|
||||||
|
"-p",
|
||||||
|
"--pid",
|
||||||
|
help=argparse.SUPPRESS,
|
||||||
|
type=str,
|
||||||
|
nargs="+",
|
||||||
|
default=None,
|
||||||
|
)
|
||||||
|
|
||||||
if args is None:
|
if args is None:
|
||||||
args = sys.argv[1:]
|
args = sys.argv[1:]
|
||||||
|
|
||||||
@@ -886,6 +897,13 @@ def run(app_args, args, **kwargs):
|
|||||||
use_execv = kwargs.get("use_execv", True)
|
use_execv = kwargs.get("use_execv", True)
|
||||||
app_pass = kwargs.get("pass_id", None)
|
app_pass = kwargs.get("pass_id", None)
|
||||||
|
|
||||||
|
if args.pid is not None:
|
||||||
|
fatal_error(
|
||||||
|
"""The -p shorthand option for --collection-period is now an upper-case -P
|
||||||
|
In the future, rocprofv3 plans to support debugger-like process attachment and -p
|
||||||
|
is de-facto standard shorthand option for this feature"""
|
||||||
|
)
|
||||||
|
|
||||||
def setattrifnone(obj, attr, value):
|
def setattrifnone(obj, attr, value):
|
||||||
if getattr(obj, f"{attr}") is None:
|
if getattr(obj, f"{attr}") is None:
|
||||||
setattr(obj, f"{attr}", value)
|
setattr(obj, f"{attr}", value)
|
||||||
|
|||||||
@@ -19,11 +19,11 @@ Here are the distinct ROCprofiler-SDK features, which also highlight the improve
|
|||||||
- PC sampling (beta implementation)
|
- PC sampling (beta implementation)
|
||||||
|
|
||||||
The former implementations allow a tool to access any of the services provided by ROCProfiler or ROCTracer, such as API tracing and kernel tracing, by calling ``roctracer_init()`` when an ROCm runtime is initially loaded.
|
The former implementations allow a tool to access any of the services provided by ROCProfiler or ROCTracer, such as API tracing and kernel tracing, by calling ``roctracer_init()`` when an ROCm runtime is initially loaded.
|
||||||
As the calling tool is not required to specify during initialization, the services it needs to use, the libraries must be effectively prepared for any service to be available anytime.
|
As the calling tool is not required to specify during initialization, the services it needs to use, the libraries must be effectively prepared for any service to be available anytime.
|
||||||
This behavior introduces unnecessary overhead and makes thread-safe data management difficult, as tools generally don't use all the available services.
|
This behavior introduces unnecessary overhead and makes thread-safe data management difficult, as tools generally don't use all the available services.
|
||||||
For example, ROCTracer always installs wrappers around every runtime API and adds indirection overhead through the ROCTracer library to check for the current service configuration in a thread-safe manner.
|
For example, ROCTracer always installs wrappers around every runtime API and adds indirection overhead through the ROCTracer library to check for the current service configuration in a thread-safe manner.
|
||||||
|
|
||||||
ROCprofiler-SDK introduces `context` to solve the preceding issues. Contexts are effectively bundles of service configurations. ROCprofiler-SDK provides a single opportunity for a tool to create as many contexts as required.
|
ROCprofiler-SDK introduces `context` to solve the preceding issues. Contexts are effectively bundles of service configurations. ROCprofiler-SDK provides a single opportunity for a tool to create as many contexts as required.
|
||||||
A tool can group all services into one context, create one context per service, or choose a mix.
|
A tool can group all services into one context, create one context per service, or choose a mix.
|
||||||
This change in the design allows ROCprofiler-SDK to be aware of the services that might be requested by a tool at any given time.
|
This change in the design allows ROCprofiler-SDK to be aware of the services that might be requested by a tool at any given time.
|
||||||
The design change empowers ROCprofiler-SDK to:
|
The design change empowers ROCprofiler-SDK to:
|
||||||
@@ -50,36 +50,36 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
|
|||||||
- rocprofv3
|
- rocprofv3
|
||||||
- Improvements
|
- Improvements
|
||||||
- Notes
|
- Notes
|
||||||
* - Basic tracing options
|
* - Basic tracing options
|
||||||
- HIP Trace
|
- HIP Trace
|
||||||
- `--hip-trace`
|
- `--hip-trace`
|
||||||
- `--hip-api`, `--hip-trace`
|
- `--hip-api`, `--hip-trace`
|
||||||
- `--hip-trace`
|
- `--hip-trace`
|
||||||
- No change
|
- No change
|
||||||
- | rocprof and rocprofv2 `--hip-trace` options include kernel dispatches and memory copy activities,
|
- | rocprof and rocprofv2 `--hip-trace` options include kernel dispatches and memory copy activities,
|
||||||
| which is not the case in rocprofv3
|
| which is not the case in rocprofv3
|
||||||
* - Basic tracing options
|
* - Basic tracing options
|
||||||
- HSA Trace
|
- HSA Trace
|
||||||
- `--hsa-trace`
|
- `--hsa-trace`
|
||||||
- `--hsa-trace`
|
- `--hsa-trace`
|
||||||
- `--hsa-trace`
|
- `--hsa-trace`
|
||||||
- No change
|
- No change
|
||||||
- | rocprof and rocprofv2 `--hsa-trace` options include kernel dispatches and memory copy activities,
|
- | rocprof and rocprofv2 `--hsa-trace` options include kernel dispatches and memory copy activities,
|
||||||
| which is not the case in rocprofv3
|
| which is not the case in rocprofv3
|
||||||
* - Basic tracing options
|
* - Basic tracing options
|
||||||
- Scratch Memory Trace
|
- Scratch Memory Trace
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- `--scratch-memory-trace`
|
- `--scratch-memory-trace`
|
||||||
- New option to trace scratch memory operations
|
- New option to trace scratch memory operations
|
||||||
-
|
-
|
||||||
* - Basic tracing options
|
* - Basic tracing options
|
||||||
- Marker Trace (ROCTx)
|
- Marker Trace (ROCTx)
|
||||||
- `--roctx-trace`
|
- `--roctx-trace`
|
||||||
- `--roctx-trace`
|
- `--roctx-trace`
|
||||||
- `--marker-trace`
|
- `--marker-trace`
|
||||||
- Improved ROCTx library with more features
|
- Improved ROCTx library with more features
|
||||||
-
|
-
|
||||||
* - Basic tracing options
|
* - Basic tracing options
|
||||||
- Memory Copy Trace
|
- Memory Copy Trace
|
||||||
- Part of HIP and HSA Traces
|
- Part of HIP and HSA Traces
|
||||||
@@ -93,56 +93,56 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
|
|||||||
- *Not Available*
|
- *Not Available*
|
||||||
- `--memory-allocation-trace`
|
- `--memory-allocation-trace`
|
||||||
- New option for collecting Memory Allocation Traces. Displays starting address, allocation size, and agent where allocation occurred.
|
- New option for collecting Memory Allocation Traces. Displays starting address, allocation size, and agent where allocation occurred.
|
||||||
-
|
-
|
||||||
* - Basic tracing options
|
* - Basic tracing options
|
||||||
- Kernel Trace
|
- Kernel Trace
|
||||||
- `--kernel-trace`
|
- `--kernel-trace`
|
||||||
- `--kernel-trace`
|
- `--kernel-trace`
|
||||||
- `--kernel-trace`
|
- `--kernel-trace`
|
||||||
- Performance improvement.
|
- Performance improvement.
|
||||||
-
|
-
|
||||||
* - Granular tracing options
|
* - Granular tracing options
|
||||||
- HIP runtime trace
|
- HIP runtime trace
|
||||||
- Part of `--hip-trace` option
|
- Part of `--hip-trace` option
|
||||||
- Part of `--hip-trace` option
|
- Part of `--hip-trace` option
|
||||||
- `--hip-runtime-trace`
|
- `--hip-runtime-trace`
|
||||||
- For collecting HIP Runtime API Traces, e.g. public HIP API functions starting with 'hip' (i.e. hipSetDevice).
|
- For collecting HIP Runtime API Traces, e.g. public HIP API functions starting with 'hip' (i.e. hipSetDevice).
|
||||||
-
|
-
|
||||||
* - Granular tracing options
|
* - Granular tracing options
|
||||||
- HIP compiler trace
|
- HIP compiler trace
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- `--hip-compiler-trace`
|
- `--hip-compiler-trace`
|
||||||
- For collecting HIP Compiler generated code Traces, e.g. HIP API functions starting with '__hip' (i.e. __hipRegisterFatBinary).
|
- For collecting HIP Compiler generated code Traces, e.g. HIP API functions starting with '__hip' (i.e. __hipRegisterFatBinary).
|
||||||
-
|
-
|
||||||
* - Granular tracing options
|
* - Granular tracing options
|
||||||
- HSA core API trace
|
- HSA core API trace
|
||||||
- Part of `--hsa-trace` option
|
- Part of `--hsa-trace` option
|
||||||
- Part of `--hsa-trace` option
|
- Part of `--hsa-trace` option
|
||||||
- `--hsa-core-trace`
|
- `--hsa-core-trace`
|
||||||
- New option for collecting only HSA API Traces (core API), e.g. HSA functions prefixed with only `hsa_` (i.e. hsa_init)
|
- New option for collecting only HSA API Traces (core API), e.g. HSA functions prefixed with only `hsa_` (i.e. hsa_init)
|
||||||
-
|
-
|
||||||
* - Granular tracing options
|
* - Granular tracing options
|
||||||
- HSA AMD trace
|
- HSA AMD trace
|
||||||
- Part of `--hsa-trace` option
|
- Part of `--hsa-trace` option
|
||||||
- Part of `--hsa-trace` option
|
- Part of `--hsa-trace` option
|
||||||
- `--hsa-amd-trace`
|
- `--hsa-amd-trace`
|
||||||
- For collecting HSA API Traces (AMD-extension API), e.g. HSA function prefixed with `hsa_amd_` (i.e. hsa_amd_coherency_get_type)
|
- For collecting HSA API Traces (AMD-extension API), e.g. HSA function prefixed with `hsa_amd_` (i.e. hsa_amd_coherency_get_type)
|
||||||
-
|
-
|
||||||
* - Granular tracing options
|
* - Granular tracing options
|
||||||
- HSA Image Extension trace
|
- HSA Image Extension trace
|
||||||
- Part of `--hsa-trace` option
|
- Part of `--hsa-trace` option
|
||||||
- Part of `--hsa-trace` option
|
- Part of `--hsa-trace` option
|
||||||
- `--hsa-image-trace`
|
- `--hsa-image-trace`
|
||||||
- New option for collecting HSA API Traces (Image-extension API), e.g. HSA functions prefixed with only `hsa_ext_image_` (i.e. hsa_ext_image_get_capability).
|
- New option for collecting HSA API Traces (Image-extension API), e.g. HSA functions prefixed with only `hsa_ext_image_` (i.e. hsa_ext_image_get_capability).
|
||||||
-
|
-
|
||||||
* - Granular tracing options
|
* - Granular tracing options
|
||||||
- HSA Finalizer trace
|
- HSA Finalizer trace
|
||||||
- Part of `--hsa-trace` option
|
- Part of `--hsa-trace` option
|
||||||
- Part of `--hsa-trace` option
|
- Part of `--hsa-trace` option
|
||||||
- `--hsa-finalizer-trace`
|
- `--hsa-finalizer-trace`
|
||||||
- New option for collecting HSA API Traces (Finalizer-extension API), e.g. HSA functions prefixed with only `hsa_ext_program_` (i.e. hsa_ext_program_create)
|
- New option for collecting HSA API Traces (Finalizer-extension API), e.g. HSA functions prefixed with only `hsa_ext_program_` (i.e. hsa_ext_program_create)
|
||||||
-
|
-
|
||||||
* - Advanced tracing options
|
* - Advanced tracing options
|
||||||
- Kokkos trace
|
- Kokkos trace
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
@@ -156,70 +156,70 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
|
|||||||
- *Not Available*
|
- *Not Available*
|
||||||
- `--rccl-trace`
|
- `--rccl-trace`
|
||||||
- For collecting RCCL (ROCm Communication Collectives Library. Also pronounced as 'Rickle' ) Traces
|
- For collecting RCCL (ROCm Communication Collectives Library. Also pronounced as 'Rickle' ) Traces
|
||||||
-
|
-
|
||||||
* - Advanced tracing options
|
* - Advanced tracing options
|
||||||
- Scratch memory trace
|
- Scratch memory trace
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- `--scratch-memory-trace`
|
- `--scratch-memory-trace`
|
||||||
- Collecting scratch memory event traces.
|
- Collecting scratch memory event traces.
|
||||||
-
|
-
|
||||||
* - Advanced tracing options
|
* - Advanced tracing options
|
||||||
- rocDecode trace
|
- rocDecode trace
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- `--rocdecode-trace`
|
- `--rocdecode-trace`
|
||||||
- Tracing rocDecode library.
|
- Tracing rocDecode library.
|
||||||
-
|
-
|
||||||
* - Advanced tracing options
|
* - Advanced tracing options
|
||||||
- rocJPEG trace
|
- rocJPEG trace
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- `--rocjpeg-trace`
|
- `--rocjpeg-trace`
|
||||||
- Tracing rocJPEG library.
|
- Tracing rocJPEG library.
|
||||||
-
|
-
|
||||||
* - Aggregate tracing options
|
* - Aggregate tracing options
|
||||||
- Sys Trace
|
- Sys Trace
|
||||||
- `--sys-trace` [hip-trace|hsa-trace|roctx-trace|kernel-trace]
|
- `--sys-trace` [hip-trace|hsa-trace|roctx-trace|kernel-trace]
|
||||||
- `--sys-trace` [hip-trace|hsa-trace|roctx-trace|kernel-trace]
|
- `--sys-trace` [hip-trace|hsa-trace|roctx-trace|kernel-trace]
|
||||||
- ` -s, --sys-trace` [hip-trace|hsa-trace|scratch-trace|memory-copy-trace|roctx-trace|kernel-trace]
|
- ` -s, --sys-trace` [hip-trace|hsa-trace|scratch-trace|memory-copy-trace|roctx-trace|kernel-trace]
|
||||||
- Extends the sys trace options with more features
|
- Extends the sys trace options with more features
|
||||||
-
|
-
|
||||||
* - Aggregate tracing options
|
* - Aggregate tracing options
|
||||||
- Runtime Trace
|
- Runtime Trace
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- ` -r, --runtime-trace` [hip-runtime-trace|scratch-trace|memory-copy-trace|roctx-trace|kernel-trace]
|
- ` -r, --runtime-trace` [hip-runtime-trace|scratch-trace|memory-copy-trace|roctx-trace|kernel-trace]
|
||||||
- New option to aggregate trace operations
|
- New option to aggregate trace operations
|
||||||
-
|
-
|
||||||
* - Kernel naming options
|
* - Kernel naming options
|
||||||
- Kernel Name Mangling
|
- Kernel Name Mangling
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- `-M`, `--mangled-kernels`
|
- `-M`, `--mangled-kernels`
|
||||||
- New option for mangled kernel names
|
- New option for mangled kernel names
|
||||||
-
|
-
|
||||||
* - Kernel naming options
|
* - Kernel naming options
|
||||||
- Kernel Name Truncation
|
- Kernel Name Truncation
|
||||||
- `--basenames <on|off>`
|
- `--basenames <on|off>`
|
||||||
- `--basenames`
|
- `--basenames`
|
||||||
- `-T`, `--truncate-kernels`
|
- `-T`, `--truncate-kernels`
|
||||||
- New option for truncating the demangled kernel names
|
- New option for truncating the demangled kernel names
|
||||||
-
|
-
|
||||||
* - Kernel naming options
|
* - Kernel naming options
|
||||||
- Kernel Rename
|
- Kernel Rename
|
||||||
- `--roctx-rename`
|
- `--roctx-rename`
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- `--kernel-rename`
|
- `--kernel-rename`
|
||||||
- New option to use region names defined by roctxRangePush/roctxRangePop regions to rename the kernels
|
- New option to use region names defined by roctxRangePush/roctxRangePop regions to rename the kernels
|
||||||
-
|
-
|
||||||
* - Post-processing tracing options
|
* - Post-processing tracing options
|
||||||
- Statistics
|
- Statistics
|
||||||
- --stats
|
- --stats
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- --stats
|
- --stats
|
||||||
- Statistics for the collected traces
|
- Statistics for the collected traces
|
||||||
-
|
-
|
||||||
* - Post-processing tracing options
|
* - Post-processing tracing options
|
||||||
- Summary
|
- Summary
|
||||||
- *Not available*
|
- *Not available*
|
||||||
@@ -240,28 +240,28 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
|
|||||||
- *Not available*
|
- *Not available*
|
||||||
- `--summary-groups REGULAR_EXPRESSION`
|
- `--summary-groups REGULAR_EXPRESSION`
|
||||||
- New option to output a summary for each set of domains matching the regular expression, e.g. 'KERNEL_DISPATCH|MEMORY_COPY' will generate a summary from all the tracing data in the KERNEL_DISPATCH and MEMORY_COPY domains
|
- New option to output a summary for each set of domains matching the regular expression, e.g. 'KERNEL_DISPATCH|MEMORY_COPY' will generate a summary from all the tracing data in the KERNEL_DISPATCH and MEMORY_COPY domains
|
||||||
-
|
-
|
||||||
* - Summary options
|
* - Summary options
|
||||||
- Summary Output File
|
- Summary Output File
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- `--summary-output-file SUMMARY_OUTPUT_FILE`
|
- `--summary-output-file SUMMARY_OUTPUT_FILE`
|
||||||
- New option to output summary to a file, stdout, or stderr (default: stderr)
|
- New option to output summary to a file, stdout, or stderr (default: stderr)
|
||||||
-
|
-
|
||||||
* - Summary options
|
* - Summary options
|
||||||
- Summary Units
|
- Summary Units
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- `-u , --summary-units`
|
- `-u , --summary-units`
|
||||||
- New option to output summary in desired time units {sec,msec,usec,nsec}
|
- New option to output summary in desired time units {sec,msec,usec,nsec}
|
||||||
-
|
-
|
||||||
* - Display options
|
* - Display options
|
||||||
- List available basic and derived metrics and PC sampling configurations
|
- List available basic and derived metrics and PC sampling configurations
|
||||||
- `--list-basic`, `--list-derived`
|
- `--list-basic`, `--list-derived`
|
||||||
- `--list-counters`
|
- `--list-counters`
|
||||||
- `-L`, `--list-avail`
|
- `-L`, `--list-avail`
|
||||||
- A valid YAML is supported for this option now
|
- A valid YAML is supported for this option now
|
||||||
-
|
-
|
||||||
* - Perfetto-specific options
|
* - Perfetto-specific options
|
||||||
- Perfetto data collection backend
|
- Perfetto data collection backend
|
||||||
- *Not available*
|
- *Not available*
|
||||||
@@ -275,7 +275,7 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
|
|||||||
- Setting env variable `rocprofiler_PERFETTO_MAX_BUFFER_SIZE_KIB` to the desired buffer size
|
- Setting env variable `rocprofiler_PERFETTO_MAX_BUFFER_SIZE_KIB` to the desired buffer size
|
||||||
- `--perfetto-buffer-size` {KB}
|
- `--perfetto-buffer-size` {KB}
|
||||||
- New option to define size of buffer for perfetto output in KB. default: 1 GB
|
- New option to define size of buffer for perfetto output in KB. default: 1 GB
|
||||||
-
|
-
|
||||||
* - Perfetto-specific options
|
* - Perfetto-specific options
|
||||||
- Perfetto Buffer fill Policy
|
- Perfetto Buffer fill Policy
|
||||||
- *Not available*
|
- *Not available*
|
||||||
@@ -289,48 +289,48 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
|
|||||||
- *Not available*
|
- *Not available*
|
||||||
- `--perfetto-shmem-size-hint` KB
|
- `--perfetto-shmem-size-hint` KB
|
||||||
- New option to define perfetto shared memory size hint in KB. default: 64 KB
|
- New option to define perfetto shared memory size hint in KB. default: 64 KB
|
||||||
-
|
-
|
||||||
* - Filtering options
|
* - Filtering options
|
||||||
- Kernel Filtration options for Counter Collection
|
- Kernel Filtration options for Counter Collection
|
||||||
- Supported in input.xml file (supports range, gpu and kernel filtration)
|
- Supported in input.xml file (supports range, gpu and kernel filtration)
|
||||||
- kernel: <kernel_name> (can only be provided in input.txt file)
|
- kernel: <kernel_name> (can only be provided in input.txt file)
|
||||||
- `--kernel-include-regex`, `--kernel-exclude-regex`, `--kernel-iteration-range`
|
- `--kernel-include-regex`, `--kernel-exclude-regex`, `--kernel-iteration-range`
|
||||||
- Extensive control over output options using regular expressions
|
- Extensive control over output options using regular expressions
|
||||||
-
|
-
|
||||||
* - I/O options
|
* - I/O options
|
||||||
- Output Directory
|
- Output Directory
|
||||||
- `-d` <data directory>
|
- `-d` <data directory>
|
||||||
- `-d` | `--output-directory`
|
- `-d` | `--output-directory`
|
||||||
- `-d` OUTPUT_DIRECTORY, `--output-directory` OUTPUT_DIRECTORY
|
- `-d` OUTPUT_DIRECTORY, `--output-directory` OUTPUT_DIRECTORY
|
||||||
- rocprofv3 supports special keys for runtime values, e.g. %pid% gets replaced by the process ID
|
- rocprofv3 supports special keys for runtime values, e.g. %pid% gets replaced by the process ID
|
||||||
-
|
-
|
||||||
* - I/O options
|
* - I/O options
|
||||||
- Output File
|
- Output File
|
||||||
- `-o` <output file>
|
- `-o` <output file>
|
||||||
- `-o` | `--output-file-name`
|
- `-o` | `--output-file-name`
|
||||||
- `-o` OUTPUT_FILE, `--output-file` OUTPUT_FILE
|
- `-o` OUTPUT_FILE, `--output-file` OUTPUT_FILE
|
||||||
- rocprofv3 supports special keys for runtime values, e.g. %pid% gets replaced by the process ID
|
- rocprofv3 supports special keys for runtime values, e.g. %pid% gets replaced by the process ID
|
||||||
-
|
-
|
||||||
* - I/O options
|
* - I/O options
|
||||||
- Logging
|
- Logging
|
||||||
- Minimal logging via environment variable
|
- Minimal logging via environment variable
|
||||||
- Minimal logging via environment variable
|
- Minimal logging via environment variable
|
||||||
- --log-level {fatal,error,warning,info,trace,env}
|
- --log-level {fatal,error,warning,info,trace,env}
|
||||||
- Extensive logging options
|
- Extensive logging options
|
||||||
-
|
-
|
||||||
* - I/O options
|
* - I/O options
|
||||||
- Plugins
|
- Plugins
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- plugin support for different output formats
|
- plugin support for different output formats
|
||||||
- Replaced by `--output-format` option
|
- Replaced by `--output-format` option
|
||||||
- Not needed as rocprofv3 supports multiple output formats
|
- Not needed as rocprofv3 supports multiple output formats
|
||||||
-
|
-
|
||||||
* - I/O options
|
* - I/O options
|
||||||
- Output Formats
|
- Output Formats
|
||||||
- CSV, JSON (Chrome-Tracing format)
|
- CSV, JSON (Chrome-Tracing format)
|
||||||
- CSV, JSON (Chrome-Tracing format), Perfetto, CTF
|
- CSV, JSON (Chrome-Tracing format), Perfetto, CTF
|
||||||
- CSV, JSON (custom schema), Perfetto, OTF2
|
- CSV, JSON (custom schema), Perfetto, OTF2
|
||||||
- | # Multiple output formats can be supported in single run.
|
- | # Multiple output formats can be supported in single run.
|
||||||
| # OTF2 can visualize larger trace files compared to perfetto.
|
| # OTF2 can visualize larger trace files compared to perfetto.
|
||||||
- The Perfetto UI does not accept the JSON output format produced by rocprofv3. Perfetto is dropping support for the JSON Chrome tracing format in favor of the binary Perfetto protobuf format (``.pftrace`` extension), which is supported by rocprofv3.
|
- The Perfetto UI does not accept the JSON output format produced by rocprofv3. Perfetto is dropping support for the JSON Chrome tracing format in favor of the binary Perfetto protobuf format (``.pftrace`` extension), which is supported by rocprofv3.
|
||||||
* - I/O options
|
* - I/O options
|
||||||
@@ -349,25 +349,25 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
|
|||||||
- `--pmc`
|
- `--pmc`
|
||||||
- New option to collect performance counters from command line. Counters should be comma OR space separated in case of more than 1 counters
|
- New option to collect performance counters from command line. Counters should be comma OR space separated in case of more than 1 counters
|
||||||
-
|
-
|
||||||
* - I/O options
|
* - I/O options
|
||||||
- Providing Custom metrics file
|
- Providing Custom metrics file
|
||||||
- `-m` <metric file>
|
- `-m` <metric file>
|
||||||
- `-m` <metric file>
|
- `-m` <metric file>
|
||||||
- `-E` <metric file> --pmc <counter>
|
- `-E` <metric file> --pmc <counter>
|
||||||
- In rocprofv3, this option has changed to provide a file with custom metrics and collect performance counters from the command line using --pmc option
|
- In rocprofv3, this option has changed to provide a file with custom metrics and collect performance counters from the command line using --pmc option
|
||||||
-
|
-
|
||||||
* - Advanced options
|
* - Advanced options
|
||||||
- Preload
|
- Preload
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- *Not Available*
|
- *Not Available*
|
||||||
- --preload
|
- --preload
|
||||||
- Libraries to prepend to LD_PRELOAD (usually for sanitizers)
|
- Libraries to prepend to LD_PRELOAD (usually for sanitizers)
|
||||||
-
|
-
|
||||||
* - Trace Control options
|
* - Trace Control options
|
||||||
- Trace Period
|
- Trace Period
|
||||||
- `--trace-period`
|
- `--trace-period`
|
||||||
- `-tp | --trace-period`
|
- `-tp | --trace-period`
|
||||||
- `-p |--collection-period`,`--collection-period-unit`
|
- `-P |--collection-period`,`--collection-period-unit`
|
||||||
- Users can specify multiple configurations, each defined by a triplet in the format `start_delay:collection_time:repeat`, with the ability to change the unit of time in the given configurations.
|
- Users can specify multiple configurations, each defined by a triplet in the format `start_delay:collection_time:repeat`, with the ability to change the unit of time in the given configurations.
|
||||||
-
|
-
|
||||||
* - Trace Control options
|
* - Trace Control options
|
||||||
@@ -376,14 +376,14 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
|
|||||||
- *Not available*
|
- *Not available*
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- Not yet in rocprofv3
|
- Not yet in rocprofv3
|
||||||
-
|
-
|
||||||
* - Trace Control options
|
* - Trace Control options
|
||||||
- Flush Interval
|
- Flush Interval
|
||||||
- `--flush-rate`
|
- `--flush-rate`
|
||||||
- `--flush-interval`
|
- `--flush-interval`
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- Not applicable for rocprofv3
|
- Not applicable for rocprofv3
|
||||||
-
|
-
|
||||||
* - Trace Control options
|
* - Trace Control options
|
||||||
- Merge Traces
|
- Merge Traces
|
||||||
- `--merge-traces`
|
- `--merge-traces`
|
||||||
@@ -397,46 +397,46 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
|
|||||||
- *Not available*
|
- *Not available*
|
||||||
- `--pc-sampling-beta-enabled`
|
- `--pc-sampling-beta-enabled`
|
||||||
- Enable pc sampling support; beta version.
|
- Enable pc sampling support; beta version.
|
||||||
-
|
-
|
||||||
* - Legacy options
|
* - Legacy options
|
||||||
- Timestamp On/Off
|
- Timestamp On/Off
|
||||||
- `--timestamp <on|off>`
|
- `--timestamp <on|off>`
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- Not applicable for rocprofv3
|
- Not applicable for rocprofv3
|
||||||
-
|
-
|
||||||
* - Legacy options
|
* - Legacy options
|
||||||
- Context wait
|
- Context wait
|
||||||
- `--ctx-wait`
|
- `--ctx-wait`
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- Not applicable for rocprofv3
|
- Not applicable for rocprofv3
|
||||||
-
|
-
|
||||||
* - Legacy options
|
* - Legacy options
|
||||||
- Context Limit
|
- Context Limit
|
||||||
- `--ctx-limit <max number>`
|
- `--ctx-limit <max number>`
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- Not applicable for rocprofv3
|
- Not applicable for rocprofv3
|
||||||
-
|
-
|
||||||
* - Legacy options
|
* - Legacy options
|
||||||
- Code Object Tracking
|
- Code Object Tracking
|
||||||
- `--obj-tracking <on|off>`
|
- `--obj-tracking <on|off>`
|
||||||
- Always ``ON`` in rocprofv2
|
- Always ``ON`` in rocprofv2
|
||||||
- Always ``ON`` in rocprofv3
|
- Always ``ON`` in rocprofv3
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - Legacy options
|
* - Legacy options
|
||||||
- Heartbeat
|
- Heartbeat
|
||||||
- `--heartbeat <rate sec>`
|
- `--heartbeat <rate sec>`
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- *Not available*
|
- *Not available*
|
||||||
- Not applicable for rocprofv3
|
- Not applicable for rocprofv3
|
||||||
-
|
-
|
||||||
|
|
||||||
|
|
||||||
========================================================
|
========================================================
|
||||||
Timing Difference Between rocprofv3 and rocprofv1/v2
|
Timing Difference Between rocprofv3 and rocprofv1/v2
|
||||||
========================================================
|
========================================================
|
||||||
|
|
||||||
``rocprofv3`` has improved the accuracy of timing information by reducing the tool overhead required to collect data and reducing the interference to the timing of the kernel being measured. The result of this work is a reduction in variance of kernel times received for the same kernel execution and more accurate timing in general. These changes have not been backported (and will not be backported) to rocprofv1/v2, so there can be substantial (20%) differences in execution time reported by v1/v2 vs v3 for a single kernel execution. Over a large number of samples of the same kernel, the difference in average execution time is in the low single digit percentage time with a much tighter variance of results on rocprofv3. We have included testing in the test suite to verify the timing information outputted by rocprofv3 to ensure that the values we are returning are accurate.
|
``rocprofv3`` has improved the accuracy of timing information by reducing the tool overhead required to collect data and reducing the interference to the timing of the kernel being measured. The result of this work is a reduction in variance of kernel times received for the same kernel execution and more accurate timing in general. These changes have not been backported (and will not be backported) to rocprofv1/v2, so there can be substantial (20%) differences in execution time reported by v1/v2 vs v3 for a single kernel execution. Over a large number of samples of the same kernel, the difference in average execution time is in the low single digit percentage time with a much tighter variance of results on rocprofv3. We have included testing in the test suite to verify the timing information outputted by rocprofv3 to ensure that the values we are returning are accurate.
|
||||||
|
|||||||
@@ -143,13 +143,13 @@ The following table lists the commonly used ``rocprofv3`` command-line options c
|
|||||||
- | ``--kernel-include-regex`` REGULAR_EXPRESSION |br| |br| |br| |br|
|
- | ``--kernel-include-regex`` REGULAR_EXPRESSION |br| |br| |br| |br|
|
||||||
| ``--kernel-exclude-regex`` REGULAR_EXPRESSION |br| |br| |br| |br|
|
| ``--kernel-exclude-regex`` REGULAR_EXPRESSION |br| |br| |br| |br|
|
||||||
| ``--kernel-iteration-range`` KERNEL_ITERATION_RANGE [KERNEL_ITERATION_RANGE ...] |br| |br|
|
| ``--kernel-iteration-range`` KERNEL_ITERATION_RANGE [KERNEL_ITERATION_RANGE ...] |br| |br|
|
||||||
| ``-p`` (START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) [(START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) ...] \| ``--collection-period`` (START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) [(START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) ...] |br| |br| |br| |br| |br| |br| |br| |br| |br| |br| |br| |br| |br| |br| |br|
|
| ``-P`` (START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) [(START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) ...] \| ``--collection-period`` (START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) [(START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) ...] |br| |br| |br| |br| |br| |br| |br| |br| |br| |br| |br| |br| |br| |br| |br|
|
||||||
| ``--collection-period-unit`` {hour,min,sec,msec,usec,nsec}
|
| ``--collection-period-unit`` {hour,min,sec,msec,usec,nsec}
|
||||||
- | Filters counter-collection and thread-trace data to include the kernels matching the specified regular expression. Non-matching kernels are excluded. |br| |br|
|
- | Filters counter-collection and thread-trace data to include the kernels matching the specified regular expression. Non-matching kernels are excluded. |br| |br|
|
||||||
| Filters counter-collection and thread-trace data to exclude the kernels matching the specified regular expression. It is applied after ``--kernel-include-regex`` option. |br| |br|
|
| Filters counter-collection and thread-trace data to exclude the kernels matching the specified regular expression. It is applied after ``--kernel-include-regex`` option. |br| |br|
|
||||||
| Specifies iteration range for each kernel matching the filter [start-stop]. |br| |br| |br|
|
| Specifies iteration range for each kernel matching the filter [start-stop]. |br| |br| |br|
|
||||||
| START_DELAY_TIME\: Time in seconds before the data collection begins. |br| COLLECTION_TIME\: Duration of data collection in seconds. |br| REPEAT\: Number of times the data collection cycle is repeated. |br| The default unit for time is seconds, which can be changed using the ``--collection-period-unit`` or ``-pu`` option. To repeat the cycle indefinitely, specify ``repeat`` as 0. You can specify multiple configurations, each defined by a triplet in the format ``start_delay_time:collection_time:repeat``. For example, the command ``-p 10:10:1 5:3:0`` specifies two configurations, the first one with a start delay time of 10 seconds, a collection time of 10 seconds, and a repeat of 1 (the cycle repeats once), and the second with a start delay time of 5 seconds, a collection time of 3 seconds, and a repeat of 0 (the cycle repeats indefinitely). |br| |br| |br|
|
| START_DELAY_TIME\: Time in seconds before the data collection begins. |br| COLLECTION_TIME\: Duration of data collection in seconds. |br| REPEAT\: Number of times the data collection cycle is repeated. |br| The default unit for time is seconds, which can be changed using the ``--collection-period-unit`` option. To repeat the cycle indefinitely, specify ``repeat`` as 0. You can specify multiple configurations, each defined by a triplet in the format ``start_delay_time:collection_time:repeat``. For example, the command ``-P 10:10:1 5:3:0`` specifies two configurations, the first one with a start delay time of 10 seconds, a collection time of 10 seconds, and a repeat of 1 (the cycle repeats once), and the second with a start delay time of 5 seconds, a collection time of 3 seconds, and a repeat of 0 (the cycle repeats indefinitely). |br| |br| |br|
|
||||||
| To change the unit of time used in ``--collection-period`` or ``-p``, specify the desired unit using the ``--collection-period-unit`` or ``-pu`` option. The available units are ``hour`` for hours, ``min`` for minutes, ``sec`` for seconds, ``msec`` for milliseconds, ``usec`` for microseconds, and ``nsec`` for nanoseconds.
|
| To change the unit of time used in ``--collection-period`` or ``-P``, specify the desired unit using the ``--collection-period-unit`` option. The available units are ``hour`` for hours, ``min`` for minutes, ``sec`` for seconds, ``msec`` for milliseconds, ``usec`` for microseconds, and ``nsec`` for nanoseconds.
|
||||||
|
|
||||||
* - Perfetto-specific
|
* - Perfetto-specific
|
||||||
- | ``--perfetto-backend`` {inprocess,system} |br| |br| |br| |br| |br|
|
- | ``--perfetto-backend`` {inprocess,system} |br| |br| |br| |br| |br|
|
||||||
@@ -935,14 +935,14 @@ For the description of the fields in the output file, see :ref:`output-file-fiel
|
|||||||
Iteration based counter multiplexing
|
Iteration based counter multiplexing
|
||||||
++++++++++++++++++++++++++++++++++++
|
++++++++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
Counter multiplexing allows a single run of the program to collect groups of counters. This is useful when the counters you want to collect exceed the hardware limits and you cannot run the program multiple times for collection.
|
Counter multiplexing allows a single run of the program to collect groups of counters. This is useful when the counters you want to collect exceed the hardware limits and you cannot run the program multiple times for collection.
|
||||||
|
|
||||||
This feature is available when using YAML (.yaml/.yml) or JSON (.json) input formats. Two new fields are introduced, ``pmc_groups`` and ``pmc_group_interval``. The ``pmc_groups`` field is used to specify the groups of counters to be collected in each run. The ``pmc_group_interval`` field is used to specify the interval between each group of counters. Interval is per-device and increments per dispatch on the device (i.e. dispatch_id). When the interval is reached the next group is selected.
|
This feature is available when using YAML (.yaml/.yml) or JSON (.json) input formats. Two new fields are introduced, ``pmc_groups`` and ``pmc_group_interval``. The ``pmc_groups`` field is used to specify the groups of counters to be collected in each run. The ``pmc_group_interval`` field is used to specify the interval between each group of counters. Interval is per-device and increments per dispatch on the device (i.e. dispatch_id). When the interval is reached the next group is selected.
|
||||||
|
|
||||||
Here is a sample input.yaml file for specifying counter multiplexing:
|
Here is a sample input.yaml file for specifying counter multiplexing:
|
||||||
|
|
||||||
.. code-block:: yaml
|
.. code-block:: yaml
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
- pmc_groups: [["SQ_WAVES", "GRBM_COUNT"], ["GRBM_GUI_ACTIVE"]]
|
- pmc_groups: [["SQ_WAVES", "GRBM_COUNT"], ["GRBM_GUI_ACTIVE"]]
|
||||||
pmc_group_interval: 4
|
pmc_group_interval: 4
|
||||||
@@ -952,7 +952,7 @@ This sample input will collect the first group of counters (``SQ_WAVES``, ``GRBM
|
|||||||
An example of the interval period for this input is given below:
|
An example of the interval period for this input is given below:
|
||||||
|
|
||||||
.. code-block:: shell
|
.. code-block:: shell
|
||||||
|
|
||||||
Device 1, <Kernel A>, Collect SQ_WAVES, GRBM_COUNT
|
Device 1, <Kernel A>, Collect SQ_WAVES, GRBM_COUNT
|
||||||
Device 1, <Kernel A>, Collect SQ_WAVES, GRBM_COUNT
|
Device 1, <Kernel A>, Collect SQ_WAVES, GRBM_COUNT
|
||||||
Device 1, <Kernel B>, Collect SQ_WAVES, GRBM_COUNT
|
Device 1, <Kernel B>, Collect SQ_WAVES, GRBM_COUNT
|
||||||
@@ -1054,7 +1054,7 @@ The agent index is a unique identifier for each agent in the system. It is used
|
|||||||
- **absolute** == *node_id* - absolute index of the agent regardless of cgroups masking. This is a monotonically increasing number that is incremented for every folder in `/sys/class/kfd/kfd/topology/nodes`. e.g. Agent-0, Agent-2, Agent-4.
|
- **absolute** == *node_id* - absolute index of the agent regardless of cgroups masking. This is a monotonically increasing number that is incremented for every folder in `/sys/class/kfd/kfd/topology/nodes`. e.g. Agent-0, Agent-2, Agent-4.
|
||||||
- **relative** == *logical_node_id* - relative index of the agent accounting for cgroups masking. This is a monotonically increasing number which is incremented for every folder in `/sys/class/kfd/kfd/topology/nodes/` whose properties file was non-empty.e.g. Agent-0, Agent-1, Agent-2
|
- **relative** == *logical_node_id* - relative index of the agent accounting for cgroups masking. This is a monotonically increasing number which is incremented for every folder in `/sys/class/kfd/kfd/topology/nodes/` whose properties file was non-empty.e.g. Agent-0, Agent-1, Agent-2
|
||||||
- **type-relative** == *logical_node_type_id* - relative index of the agent accounting for cgroups masking where indexing starts at zero for each agent type. e.g. CPU-0, GPU-0, GPU-1
|
- **type-relative** == *logical_node_type_id* - relative index of the agent accounting for cgroups masking where indexing starts at zero for each agent type. e.g. CPU-0, GPU-0, GPU-1
|
||||||
|
|
||||||
|
|
||||||
To set the agent index in the output files, use the ``--agent-index`` option. The default value is ``relative``.
|
To set the agent index in the output files, use the ``--agent-index`` option. The default value is ``relative``.
|
||||||
|
|
||||||
@@ -1071,19 +1071,19 @@ Here is the `rocm-smi` output:
|
|||||||
|
|
||||||
.. code-block:: shell
|
.. code-block:: shell
|
||||||
|
|
||||||
$ cat kernel_trace.csv
|
$ cat kernel_trace.csv
|
||||||
|
|
||||||
"Kind","Agent_Id","Queue_Id","Stream_Id","Thread_Id","Dispatch_Id","Kernel_Id","Kernel_Name","Correlation_Id","Start_Timestamp","End_Timestamp","Private_Segment_Size","Group_Segment_Size","Workgroup_Size_X","Workgroup_Size_Y","Workgroup_Size_Z","Grid_Size_X","Grid_Size_Y","Grid_Size_Z"
|
"Kind","Agent_Id","Queue_Id","Stream_Id","Thread_Id","Dispatch_Id","Kernel_Id","Kernel_Name","Correlation_Id","Start_Timestamp","End_Timestamp","Private_Segment_Size","Group_Segment_Size","Workgroup_Size_X","Workgroup_Size_Y","Workgroup_Size_Z","Grid_Size_X","Grid_Size_Y","Grid_Size_Z"
|
||||||
"KERNEL_DISPATCH","Agent 7",1,2,15044,1,17,"void addition_kernel<float>(float*, float const*, float const*, int, int)",1,1671247151691610,1671247151718010,0,0,64,1,1,1024,1024,1
|
"KERNEL_DISPATCH","Agent 7",1,2,15044,1,17,"void addition_kernel<float>(float*, float const*, float const*, int, int)",1,1671247151691610,1671247151718010,0,0,64,1,1,1024,1024,1
|
||||||
|
|
||||||
.. code-block:: shell
|
.. code-block:: shell
|
||||||
|
|
||||||
rocprofv3 --kernel-trace --agent-index=type-relative -- <application_path>
|
rocprofv3 --kernel-trace --agent-index=type-relative -- <application_path>
|
||||||
|
|
||||||
.. code-block:: shell
|
.. code-block:: shell
|
||||||
|
|
||||||
$ cat kernel_trace.csv
|
$ cat kernel_trace.csv
|
||||||
|
|
||||||
"Kind","Agent_Id","Queue_Id","Stream_Id","Thread_Id","Dispatch_Id","Kernel_Id","Kernel_Name","Correlation_Id","Start_Timestamp","End_Timestamp","Private_Segment_Size","Group_Segment_Size","Workgroup_Size_X","Workgroup_Size_Y","Workgroup_Size_Z","Grid_Size_X","Grid_Size_Y","Grid_Size_Z"
|
"Kind","Agent_Id","Queue_Id","Stream_Id","Thread_Id","Dispatch_Id","Kernel_Id","Kernel_Name","Correlation_Id","Start_Timestamp","End_Timestamp","Private_Segment_Size","Group_Segment_Size","Workgroup_Size_X","Workgroup_Size_Y","Workgroup_Size_Z","Grid_Size_X","Grid_Size_Y","Grid_Size_Z"
|
||||||
"KERNEL_DISPATCH","GPU 3",1,2,15056,1,17,"void addition_kernel<float>(float*, float const*, float const*, int, int)",1,1671390884499766,1671390884525686,0,0,64,1,1,1024,1024,1
|
"KERNEL_DISPATCH","GPU 3",1,2,15056,1,17,"void addition_kernel<float>(float*, float const*, float const*, int, int)",1,1671390884499766,1671390884525686,0,0,64,1,1,1024,1024,1
|
||||||
|
|
||||||
@@ -1154,7 +1154,7 @@ To enable kernel name truncation, use the ``--truncate-kernels`` option.
|
|||||||
|
|
||||||
rocprofv3 --truncate-kernels --kernel-trace -- <application_path>
|
rocprofv3 --truncate-kernels --kernel-trace -- <application_path>
|
||||||
|
|
||||||
The above command generates a ``kernel_trace.csv`` file with truncated kernel names.
|
The above command generates a ``kernel_trace.csv`` file with truncated kernel names.
|
||||||
|
|
||||||
.. csv-table:: Kernel trace truncated
|
.. csv-table:: Kernel trace truncated
|
||||||
:file: /data/kernel_trace_truncated.csv
|
:file: /data/kernel_trace_truncated.csv
|
||||||
@@ -1361,7 +1361,7 @@ The above command generates an ``%hostname%/%pid%_hip_api_trace.csv`` file.
|
|||||||
Collection period
|
Collection period
|
||||||
+++++++++++++++++++
|
+++++++++++++++++++
|
||||||
|
|
||||||
The collection period is the time interval during which the profiling data is collected. You can specify the collection period using the ``--collection-period`` or ``-p`` option.
|
The collection period is the time interval during which the profiling data is collected. You can specify the collection period using the ``--collection-period`` or ``-P`` option.
|
||||||
Users can specify multiple configurations, each defined by a triplet in the format `start_delay:collection_time:repeat`.
|
Users can specify multiple configurations, each defined by a triplet in the format `start_delay:collection_time:repeat`.
|
||||||
|
|
||||||
The triplet is defined as follows:
|
The triplet is defined as follows:
|
||||||
@@ -1399,7 +1399,7 @@ The following options are specific to Perfetto tracing and are used to control t
|
|||||||
- **DISCARD**: The buffer stops accepting data once full. Further write attempts are dropped.
|
- **DISCARD**: The buffer stops accepting data once full. Further write attempts are dropped.
|
||||||
|
|
||||||
- **--perfetto-buffer-size KB**: Size of buffer for perfetto output in KB. default: 1 GB. If set, stops the tracing session after N bytes have been written. Used to cap the size of the trace.
|
- **--perfetto-buffer-size KB**: Size of buffer for perfetto output in KB. default: 1 GB. If set, stops the tracing session after N bytes have been written. Used to cap the size of the trace.
|
||||||
|
|
||||||
- **--perfetto-backend {inprocess,system}**: Perfetto data collection backend. 'system' mode requires starting traced and perfetto daemons.By default Perfetto keeps the full trace buffer(s) in memory.
|
- **--perfetto-backend {inprocess,system}**: Perfetto data collection backend. 'system' mode requires starting traced and perfetto daemons.By default Perfetto keeps the full trace buffer(s) in memory.
|
||||||
|
|
||||||
- **--perfetto-shmem-size-hint KB**: Perfetto shared memory size hint in KB. default: 64 KB. This option gives you control over shared memory buffer sizing. Thisoption can be tweaked to avoid data loses when data is produced at a higher rate.
|
- **--perfetto-shmem-size-hint KB**: Perfetto shared memory size hint in KB. default: 64 KB. This option gives you control over shared memory buffer sizing. Thisoption can be tweaked to avoid data loses when data is produced at a higher rate.
|
||||||
|
|||||||
Reference in New Issue
Block a user