Minor editorial changes data type selection feature (#816)
このコミットが含まれているのは:
+4
-4
@@ -25,11 +25,11 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
|
||||
* -b option in profile mode also accept hardware IP block for filtering, however, this support will be deprecated soon
|
||||
* --list-metrics option added in profile mode to list possible metric id(s), similar to analyze mode
|
||||
|
||||
* Datatype selection option for roofline profiling
|
||||
* --roofline-data-type / -R option added to specify which datatypes the user wants to capture in the roofline PDF plot outputs
|
||||
* Data type selection option for roofline profiling
|
||||
* --roofline-data-type / -R option added to specify which data types the user wants to capture in the roofline PDF plot outputs
|
||||
* Default is FP32, but user can specify as many types as desired to overlay on the same plot output
|
||||
|
||||
* Additional datatypes for roofline profiling
|
||||
* Additional data types for roofline profiling
|
||||
* Now supports FP4, FP6, FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on gpu architecture)
|
||||
|
||||
* Support host-trap PC Sampling on CLI (beta version)
|
||||
@@ -40,7 +40,7 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
|
||||
* Scheduler-Pipe Wave Utilization
|
||||
* Scheduler FIFO Full Rate
|
||||
* CPC ADC Utilization
|
||||
* F6F4 datatype metrics
|
||||
* F6F4 data type metrics
|
||||
* Update formula for total FLOPs while taking into account F6F4 ops
|
||||
* LDS STORE, LDS LOAD, LDS ATOMIC instruction count metrics
|
||||
* LDS STORE, LDS LOAD, LDS ATOMIC bandwidth metrics
|
||||
|
||||
@@ -76,9 +76,9 @@ application's profiling data:
|
||||
#. Memory Chart Analysis
|
||||
#. Empirical Roofline Analysis
|
||||
|
||||
Use ``--roofline-data-type`` option to specify which datatype(s) you would like displayed on the roofline PDFs in the standalone analysis GUI.
|
||||
Datatypes can be stacked- for example, "--roofline-data-type FP32 FP64 I32" would display one PDF with FP32 and FP64 stacked, and one PDF with INT32.
|
||||
Default roofline datatype plotted is FP32.
|
||||
Use ``--roofline-data-type`` option to specify which data type(s) you would like displayed on the roofline PDFs in the standalone analysis GUI.
|
||||
Data types can be stacked- for example, "--roofline-data-type FP32 FP64 I32" would display one PDF with FP32 and FP64 stacked, and one PDF with INT32.
|
||||
Default roofline data type plotted is FP32.
|
||||
|
||||
#. Top Stats (Top Kernel Statistics)
|
||||
#. System Info
|
||||
|
||||
@@ -197,7 +197,7 @@ an Instinct MI210 vs an Instinct MI250.
|
||||
``sysinfo.csv``, is created to reflect the target device settings. All
|
||||
profiling output is stored in ``log.txt``. Roofline-specific benchmark
|
||||
results are stored in ``roofline.csv`` and roofline plots are outputted into PDFs as
|
||||
``empirRoof_gpu-0_[datatype1]_..._[datatypeN].pdf`` where datatypes requested through
|
||||
``empirRoof_gpu-0_[datatype1]_..._[datatypeN].pdf`` where data types requested through
|
||||
``--roofline-data-type`` option are listed in the file name.
|
||||
|
||||
.. code-block:: shell-session
|
||||
@@ -477,11 +477,11 @@ Roofline options
|
||||
running a roofline benchmark on your system.
|
||||
|
||||
``--roofline-data-type <datatype>``
|
||||
Allows you to specify datatypes that you want plotted in the roofline PDF output(s). Selecting more than one datatype will overlay the results onto the same plot. Default: FP32
|
||||
Allows you to specify data types that you want plotted in the roofline PDF output(s). Selecting more than one data type will overlay the results onto the same plot. Default: FP32
|
||||
|
||||
.. note::
|
||||
|
||||
For more information on datatypes supported based on the GPU architecture, see :doc:`../../conceptual/performance-model`
|
||||
For more information on data types supported based on the GPU architecture, see :doc:`../../conceptual/performance-model`
|
||||
|
||||
To distinguish different kernels in your ``.pdf`` roofline plot use
|
||||
``--kernel-names``. This will give each kernel a unique marker identifiable from
|
||||
@@ -525,7 +525,7 @@ successfully.
|
||||
|
||||
.. note::
|
||||
|
||||
* ROCm Compute Profiler currently captures roofline profiling for all data types, but has the ability to reduce clutter in the PDF outputs by selecting datatype(s). Selecting multiple datatypes will overlay the results into the same PDF. If the user would like separate PDFs for each datatype off of the same workload run, the user can run the profiling command again with the single datatype as long as the roofline.csv still exists in the workload folder.
|
||||
* ROCm Compute Profiler currently captures roofline profiling for all data types, and you can reduce the clutter in the PDF outputs by filtering the data type(s). Selecting multiple data types will overlay the results into the same PDF. To generate results in separate PDFs for each data type from the same workload run, you can re-run the profiling command with each data type as long as the ``roofline.csv`` file still exists in the workload folder.
|
||||
* Roofline feature is currently not enabled on AMD Instinct MI350.
|
||||
|
||||
The following image is a sample ``empirRoof_gpu-0_FP32.pdf`` roofline
|
||||
|
||||
新しいイシューから参照
ユーザーをブロックする