Doc updates for AFAR V (#875)
* Doc updates for AFAR V
* doc updates
* Updating All AFAR history
* updating table with --output-format
* updating doc for yaml and json support
* Update source/docs/pc_sampling.md
* Update README.md
* Update source/docs/pc_sampling.md
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
* Update pc_sampling.md
---------
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
[ROCm/rocprofiler-sdk commit: 6c41b8d73a]
This commit is contained in:
@@ -0,0 +1,77 @@
|
||||
# Changelog for ROCprofiler-SDK
|
||||
|
||||
Full documentation for ROCprofiler-SDK is available at [Click Here](source/docs/index.md)
|
||||
|
||||
## ROCprofiler-SDK for AFAR I
|
||||
|
||||
## Added
|
||||
|
||||
- HSA API Tracing
|
||||
- Kernel Dispatch Tracing
|
||||
- Kernel Dispatch Counter Collection
|
||||
- Instances are reported as single dimensions
|
||||
- No serialization
|
||||
|
||||
## ROCprofiler-SDK for AFAR II
|
||||
|
||||
## Added
|
||||
|
||||
- HIP API Tracing
|
||||
- ROCTx Tracing
|
||||
- Tracing ROCProf Tool V3
|
||||
- Packaging Documentation
|
||||
- ROCTx start/stop
|
||||
- Memory Copy Tracing
|
||||
|
||||
|
||||
## ROCprofiler-SDK for AFAR III
|
||||
|
||||
## Added
|
||||
|
||||
- Kernel Dispatch Counter Collection – (includes serialization and multidimensional instances)
|
||||
- Kernel serialization
|
||||
- Serialization on/off handling
|
||||
- ROCprof Tool Plugin Interface V3 for Counters and Dimensions
|
||||
- List metrics support
|
||||
- Correlation-id retirement
|
||||
- HIP and HSA trace distinction
|
||||
- --hip-runtime-trace For Collecting HIP Runtime API Traces
|
||||
- --hip-compiler-trace For Collecting HIP Compiler generated code Traces
|
||||
- --hsa-core-trace For Collecting HSA API Traces (core API)
|
||||
- --hsa-amd-trace For Collecting HSA API Traces (AMD-extension API)
|
||||
- --hsa-image-trace For Collecting HSA API Traces (Image-extension API)
|
||||
- --hsa-finalizer-trace For Collecting HSA API Traces (Finalizer-extension API)
|
||||
|
||||
## ROCprofiler-SDK for AFAR IV
|
||||
|
||||
## Added
|
||||
|
||||
- Page Migration Reporting (API)
|
||||
- Scratch Memory Reporting (API)
|
||||
- Kernel Dispatch Callback Tracing (API)
|
||||
- External Correlation ID Request Service (API)
|
||||
- Buffered counter collection record headers (API)
|
||||
- Remove HSA dependency from counter collection (API)
|
||||
- rocprofv3 Multi-GPU support in single-process (tool)
|
||||
|
||||
## ROCprofiler-SDK for AFAR V
|
||||
|
||||
## Added
|
||||
|
||||
- Agent/Device Counter Collection (API)
|
||||
- JSON output format support (tool)
|
||||
- Perfetto output format support(.pftrace) (tool)
|
||||
- Input YAML support for counter collection (tool)
|
||||
- Input JSON support for counter collection (tool)
|
||||
- PC Sampling (Beta)(API)
|
||||
- ROCProf V3 Multi-GPU Support:
|
||||
- Merged files
|
||||
- Multi-process (multiple files)
|
||||
|
||||
## Fixed
|
||||
|
||||
- SQ_ACCUM_PREV and SQ_ACCUM_PREV_HIRE overwriting issue
|
||||
|
||||
## Changed
|
||||
|
||||
- rocprofv3 tool now needs `--` in front of application. For detailed uses, please [Click Here](source/docs/rocprofv3.md)
|
||||
@@ -17,6 +17,7 @@ ROCProfiler-SDK is AMD’s new and improved tooling infrastructure, providing a
|
||||
- HSA API tracing
|
||||
- HSA operation tracing
|
||||
- Marker(ROCtx) tracing
|
||||
- PC Sampling (Beta)
|
||||
|
||||
## Tool Support
|
||||
|
||||
@@ -65,3 +66,18 @@ Please report in the Github Issues.
|
||||
## Limitations
|
||||
|
||||
- Individual XCC mode is not supported.
|
||||
|
||||
- By default, PC sampling API is disabled. To use PC sampling. Setting the `ROCPROFILER_PC_SAMPLING_BETA_ENABLED` environment variable grants access to the PC Sampling experimental beta feature. This feature is still under development and may not be completely stable.
|
||||
**Risk Acknowledgment**:
|
||||
- By activating this environment variable, you acknowledge and accept the following potential risks:
|
||||
- **Hardware Freeze**: This beta feature could cause your hardware to freeze unexpectedly.
|
||||
- **Need for Cold Restart**: In the event of a hardware freeze, you may need to perform a cold restart (turning the hardware off and on) to restore normal operations.
|
||||
Please use this beta feature cautiously. It may affect your system's stability and performance. Proceed at your own risk.
|
||||
|
||||
- At this point, We do not recommend stress-testing the beta implementation.
|
||||
|
||||
- Correlation IDs provided by the PC sampling service are verified only for HIP API calls.
|
||||
|
||||
- Timestamps in PC sampling records might not be 100% accurate.
|
||||
|
||||
- Using PC sampling on multi-threaded applications might fail with `HSA_STATUS_ERROR_EXCEPTION`.Furthermore, if three or more threads launch operations to the same agent, and if PC sampling is enabled, the `HSA_STATUS_ERROR_EXCEPTION` might appear.
|
||||
|
||||
@@ -13,3 +13,4 @@
|
||||
- Simplified management of enabling/disabling one or more data collection services
|
||||
- Improved error checking and logging
|
||||
- Backwards ABI compatibility (goal)
|
||||
- PC Sampling(Beta Implementation)
|
||||
|
||||
@@ -12,6 +12,7 @@
|
||||
tool_library_overview
|
||||
callback_services
|
||||
buffered_services
|
||||
pc_sampling
|
||||
intercept_table
|
||||
developer_api
|
||||
samples
|
||||
|
||||
@@ -0,0 +1,10 @@
|
||||
# PC Sampling Method
|
||||
|
||||
PC Sampling is a profiling method that uses statistical approximation of the kernel execution by sampling GPU program counters. Furthermore, the method periodically chooses an active wave (in a round robin manner) and snapshot it's program counter (PC). The process takes place on every compute unit simultaneously which makes it device-wide PC sampling. The outcome is the histogram of samples that says how many times each kernel instruction was sampled.
|
||||
|
||||
**Note**: The PC sampling feature is still under development and may not be completely stable.
|
||||
**Risk Acknowledgment**:
|
||||
- By activating this feature through `ROCPROFILER_PC_SAMPLING_BETA_ENABLED` environment variable, you acknowledge and accept the following potential risks:
|
||||
- **Hardware Freeze**: This beta feature could cause your hardware to freeze unexpectedly.
|
||||
- **Need for Cold Restart**: In the event of a hardware freeze, you may need to perform a cold restart (turning the hardware off and on) to restore normal operations.
|
||||
Please use this beta feature cautiously. It may affect your system's stability and performance. Proceed at your own risk.
|
||||
@@ -130,7 +130,6 @@ Below is the list of `rocprofv3` command-line options. Some options are used for
|
||||
|
||||
| Option | Description | Use |
|
||||
|--------|-------------|-----|
|
||||
| -d \| --output-directory | Specifies the path for the output files. | Output control |
|
||||
| --hip-trace | Collects HIP runtime traces. | Application tracing |
|
||||
| --hip-runtime-trace | Collects HIP runtime API traces. | Application tracing |
|
||||
| --hip-compiler-trace | Collects HIP compiler-generated code traces. | Application tracing |
|
||||
@@ -140,16 +139,18 @@ Below is the list of `rocprofv3` command-line options. Some options are used for
|
||||
| --hsa-amd-trace | Collects HSA API traces (AMD-extension API). | Application tracing |
|
||||
| --hsa-image-trace | Collects HSA API Ttaces (Image-extension API). | Application tracing |
|
||||
| --hsa-finalizer-trace | Collects HSA API traces (Finalizer-extension API). | Application tracing |
|
||||
| -i | Specifies the input file. | Kernel profiling |
|
||||
|--stats | For Collecting statistics of enabled tracing types | Application tracing |
|
||||
| -L \| --list-metrics | List metrics for counter collection. | Kernel profiling |
|
||||
| --stats | For Collecting statistics of enabled tracing types | Application tracing |
|
||||
| --kernel-trace | Collects kernel dispatch traces. | Application tracing |
|
||||
| -M \| --mangled-kernels | Overrides the default demangling of kernel names. | Output control |
|
||||
| --marker-trace | Collects marker (ROC-TX) traces. | Application tracing |
|
||||
| --memory-copy-trace | Collects memory copy traces. | Application tracing |
|
||||
| -o \| --output-file | Specifies the name of the output file. Note that this name is appended to the default names (_api_trace or counter_collection.csv) of the generated files'. | Output control |
|
||||
| --sys-trace | Collects HIP, HSA, memory copy, marker, and kernel dispatch traces. | Application Tracing |
|
||||
| -i | Specifies the input file. | Kernel profiling |
|
||||
| -L \| --list-metrics | List metrics for counter collection. | Kernel profiling |
|
||||
| -d \| --output-directory | Specifies the path for the output files. | Output control |
|
||||
| -o \| --output-file | Specifies the name of the output file. Note that this name is appended to the default names (_api_trace or counter_collection.csv) of the generated files'. | Output control |
|
||||
| -M \| --mangled-kernels | Overrides the default demangling of kernel names. | Output control |
|
||||
| -T \| --truncate-kernels | Truncates the demangled kernel names for improved readability. | Output control |
|
||||
| --output-format | For adding output format (supported formats: csv, json, pftrace) | Output control |
|
||||
|
||||
You can also see all the `rocprofv3` options using:
|
||||
|
||||
@@ -405,7 +406,7 @@ For more information on counters available on MI200, refer to the [MI200 Perform
|
||||
|
||||
#### Input file
|
||||
|
||||
To collect the desired basic counters or derived metrics, you can just mention them in an input file below. The line consisting of the counter or metric names must begin with `pmc`. We support input file in text(.txt extension) or yaml(.yaml/.yml) format.
|
||||
To collect the desired basic counters or derived metrics, you can just mention them in an input file below. The line consisting of the counter or metric names must begin with `pmc`. We support input file in text(.txt extension), yaml(.yaml/.yml) and json(.json) format.
|
||||
|
||||
```bash
|
||||
$ cat input.txt
|
||||
@@ -415,7 +416,7 @@ pmc: GRBM_GUI_ACTIVE
|
||||
|
||||
OR
|
||||
|
||||
$ cat input.yaml
|
||||
$ cat input.json
|
||||
{
|
||||
"metrics": [
|
||||
{
|
||||
@@ -426,6 +427,22 @@ $ cat input.yaml
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
OR
|
||||
|
||||
$ cat input.yaml
|
||||
|
||||
metrics:
|
||||
- pmc:
|
||||
- SQ_WAVES
|
||||
- GRBM_COUNT
|
||||
- GUI_ACTIVE
|
||||
- 'TCC_HIT[1]'
|
||||
- 'TCC_HIT[2]'
|
||||
- pmc:
|
||||
- FETCH_SIZE
|
||||
- WRITE_SIZE
|
||||
|
||||
```
|
||||
|
||||
The GPU hardware resources limit the number of basic counters or derived metrics that can be collected in one run of profiling. If too many counters or metrics are selected, the kernels need to be executed multiple times to collect them. For multi-pass execution, include multiple `pmc` rows in the input file. Counters or metrics in each `pmc` row can be collected in each kernel run.
|
||||
|
||||
Reference in New Issue
Block a user