[rocprof-compute] remove references to --kernel-names (#1543)

* remove references to --kernel-names

* ruff format

* remove redundant comments

* update docs and roofline image

* added two output lines to docs
此提交包含在:
jamessiddeley-amd
2025-11-10 11:47:39 -05:00
提交者 GitHub
父節點 60b81681c0
當前提交 42cc721a4b
共有 8 個檔案被更改,包括 290 行新增51 行删除
+1 -2
查看文件
@@ -105,7 +105,6 @@ Standalone Roofline Options:
vL1D
LDS
--device GPU device ID. (DEFAULT: ALL)
--kernel-names Include kernel names in roofline plot.
```
The following sample command profiles the *vcopy* workload.
@@ -377,7 +376,7 @@ Standalone Roofline Options:
- The `--device` \<gpu_id> allows you to specify a device id to collect performace data from when running our roofline benchmark on your system.
- If you'd like to distinguish different kernels in your .pdf roofline plot use `--kernel-names`. This will give each kernel a unique marker identifiable from the plot's key.
- Each kernel in your .pdf roofline plot is automatically distinguished with a unique marker identifiable from the plot's key.
#### Roofline Only
+1 -1
查看文件
@@ -315,7 +315,7 @@ Standalone Roofline Options:
- The `--device` \<gpu_id> allows you to specify a device id to collect performance data from when running our roofline benchmark on your system.
- If you would like to distinguish different kernels in your .pdf roofline plot use `--kernel-names`. This will give each kernel a unique marker identifiable from the plot's key.
- Each kernel in your .pdf roofline plot is automatically distinguished with a unique marker identifiable from the plot's key.
#### Roofline Only
未顯示二進位檔案。

之前

寬度:  |  高度:  |  大小: 64 KiB

之後

寬度:  |  高度:  |  大小: 160 KiB

+43 -22
查看文件
@@ -583,9 +583,11 @@ Roofline options
For more information on data types supported based on the GPU architecture, see :doc:`../../conceptual/performance-model`
To distinguish different kernels in your ``.pdf`` roofline plot use
``--kernel-names``. This will give each kernel a unique marker identifiable from
the plot's key.
Each kernel in your ``.pdf`` roofline plot is automatically distinguished with a unique marker identifiable from the plot's key. The roofline PDF includes an integrated multi-subplot layout with:
1. **Roofline Plot** - Shows performance ceilings and kernel arithmetic intensity points
2. **Plot Points & Values Table** - Displays AI values, performance metrics, memory/compute bound status, and cache levels for each kernel
3. **Full Kernel Names Table** - Lists complete kernel names with their corresponding plot markers
Roofline only
@@ -595,33 +597,52 @@ The following example demonstrates profiling roofline data only:
.. code-block:: shell-session
$ rocprof-compute profile --name vcopy --roof-only -- ./vcopy -n 1048576 -b 256
$ rocprof-compute profile --name occupancy --roof-only -- ./tests/occupancy -n 1048576 -b 256
__ _
_ __ ___ ___ _ __ _ __ ___ / _| ___ ___ _ __ ___ _ __ _ _| |_ ___
| '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \
| | | (_) | (__| |_) | | | (_) | _|_____| (_| (_) | | | | | | |_) | |_| | || __/
|_| \___/ \___| .__/|_| \___/|_| \___\___/|_| |_| |_| .__/ \__,_|\__\___|
|_| |_|
...
[roofline] Checking for roofline.csv in /home/auser/repos/rocprofiler-compute/sample/workloads/vcopy/MI200
[roofline] No roofline data found. Generating...
Checking for roofline.csv in /home/auser/repos/rocprofiler-compute/sample/workloads/vcopy/MI200
INFO [roofline] Generating pmc_perf.csv (roofline counters only).
INFO Rocprofiler-Compute version: 3.3.0
INFO Profiler choice: rocprofiler-sdk
INFO Path: /app/projects/rocprofiler-compute/workloads/occupancy/MI300X_A1
INFO Target: MI300X_A1
INFO Command: ./tests/occupancy -n 1048576 -b 256
INFO Kernel Selection: None
INFO Dispatch Selection: None
INFO Filtered sections: ['4']
INFO
INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INFO Collecting Performance Counters (Roofline Only)
INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INFO
INFO [Run 1/3][Approximate profiling time left: pending first measurement...]
INFO [profiling] Current input file: /app/projects/rocprofiler-compute/workloads/occupancy/MI300X_A1/perfmon/pmc_perf_0.txt
...
INFO [roofline] Checking for roofline.csv in /app/projects/rocprofiler-compute/workloads/occupancy/MI300X_A1
INFO [roofline] No roofline data found. Generating...
Empirical Roofline Calculation
Copyright © 2022 Advanced Micro Devices, Inc. All rights reserved.
Total detected GPU devices: 4
GPU Device 0: Profiling...
99% [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
...
Empirical Roofline PDFs saved!
Copyright © 2025 Advanced Micro Devices, Inc. All rights reserved.
Total detected GPU devices: 8
GPU Device 0 (gfx942) with 304 CUs: Profiling...
99% [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
...
An inspection of our workload output folder shows ``.pdf`` plots were generated
successfully.
.. code-block:: shell-session
$ ls workloads/vcopy/MI200/
$ ls workloads/occupancy/MI300X_A1
total 48
-rw-r--r-- 1 auser agroup 13331 Mar 1 16:05 empirRoof_gpu-0_FP32.pdf
drwxr-xr-x 1 auser agroup 0 Mar 1 16:03 perfmon
-rw-r--r-- 1 auser agroup 1101 Mar 1 16:03 pmc_perf.csv
-rw-r--r-- 1 auser agroup 1715 Mar 1 16:05 roofline.csv
-rw-r--r-- 1 auser agroup 650 Mar 1 16:03 sysinfo.csv
-rw-r--r-- 1 auser agroup 399 Mar 1 16:03 timestamps.csv
-rw-r--r-- 1 auser agroup 13331 Oct 29 10:33 empirRoof_gpu-0_FP32.pdf
drwxr-xr-x 1 auser agroup 0 Oct 29 10:33 perfmon
-rw-r--r-- 1 auser agroup 1101 Oct 29 10:33 pmc_perf.csv
-rw-r--r-- 1 auser agroup 1715 Oct 29 10:33 roofline.csv
-rw-r--r-- 1 auser agroup 650 Oct 29 10:33 sysinfo.csv
-rw-r--r-- 1 auser agroup 399 Oct 29 10:33 timestamps.csv
.. note::
-7
查看文件
@@ -439,13 +439,6 @@ Examples:
type=int,
help="\t\t\tTarget GPU device ID. (DEFAULT: 0)",
)
roofline_group.add_argument(
"--kernel-names",
required=False,
default=False,
action="store_true",
help="\t\t\tInclude kernel names in roofline plot.",
)
roofline_group.add_argument(
"-R",
"--roofline-data-type",
+1 -1
查看文件
@@ -2,7 +2,7 @@
declare -A commands=(
[path]=' '
[no_roof]='--no-roof'
[kernel_names]='--roof-only --kernel-names'
[kernel_names]='--roof-only'
[device_filter]='--device 0'
[kernel]='--kernel "vecCopy(double*, double*, double*, int, int) [clone .kd]"'
[ipblocks_SQ]='-b SQ'
+244 -17
查看文件
@@ -587,14 +587,21 @@ def test_path_rocpd(
@pytest.mark.roofline
def test_roof_kernel_names(binary_handler_profile_rocprof_compute):
def test_roof_basic_validation(binary_handler_profile_rocprof_compute):
"""
Test basic roofline PDF generation with full validation pipeline.
This test runs the complete validation flow including counter logging
and metric comparison (if enabled in config). Validates that roofline PDFs
are generated with the integrated multi-subplot layout (roofline plot +
plot points table + kernel names table).
"""
if soc in ("MI100"):
# roofline is not supported on MI100
assert True
# Do not continue testing
return
options = ["--device", "0", "--roof-only", "--kernel-names"]
options = ["--device", "0", "--roof-only"]
workload_dir = test_utils.get_output_dir()
returncode = binary_handler_profile_rocprof_compute(
config, workload_dir, options, check_success=False, roof=True
@@ -631,7 +638,6 @@ def test_roof_multiple_data_types(binary_handler_profile_rocprof_compute):
"--device",
"0",
"--roof-only",
"--kernel-names",
"--roofline-data-type",
dtype,
]
@@ -666,7 +672,6 @@ def test_roof_invalid_data_type(binary_handler_profile_rocprof_compute):
"--device",
"0",
"--roof-only",
"--kernel-names",
"--roofline-data-type",
"INVALID_TYPE",
]
@@ -815,7 +820,6 @@ def test_roofline_workload_dir_not_set_error():
class MockArgs:
def __init__(self):
self.roof_only = True
self.kernel_names = False
self.mem_level = "ALL"
self.sort = "ALL"
self.roofline_data_type = ["FP32"]
@@ -828,7 +832,6 @@ def test_roofline_workload_dir_not_set_error():
"device_id": 0,
"sort_type": "kernels",
"mem_level": "ALL",
"include_kernel_names": False,
"is_standalone": True,
"roofline_data_type": ["FP32"],
}
@@ -879,8 +882,15 @@ def test_roof_workload_dir_validation(binary_handler_profile_rocprof_compute):
@pytest.mark.roofline
def test_roofline_empty_kernel_names_handling(binary_handler_profile_rocprof_compute):
"""
Test empirical_roofline() when num_kernels == 0
This should trigger the "No kernel names found" log message
Test roofline behavior when kernel filter doesn't match any
kernels during initial profiling.
When profiling with a non-matching kernel filter, no counter
data is collected, so roofline generation is skipped with a
warning (but returns success code 0).
This is different from filtering existing profiling data with
a non-matching kernel name, which produces an explicit error.
"""
if soc in ("MI100"):
pytest.skip("Skipping roofline test for MI100")
@@ -890,14 +900,20 @@ def test_roofline_empty_kernel_names_handling(binary_handler_profile_rocprof_com
"--device",
"0",
"--roof-only",
"--kernel-names",
"--kernel",
"nonexistent_kernel_name_that_should_not_match_anything",
]
workload_dir = test_utils.get_output_dir()
returncode = binary_handler_profile_rocprof_compute( # noqa: F841
config, workload_dir, options, check_success=True, roof=True
returncode = binary_handler_profile_rocprof_compute(
config, workload_dir, options, check_success=False, roof=True
)
assert returncode == 0, f"Expected success (returncode=0), got {returncode}"
pdf_files = list(Path(workload_dir).glob("empirRoof_*.pdf"))
assert len(pdf_files) == 0, (
"No roofline PDF should be generated when no kernels match"
)
test_utils.clean_output_dir(config["cleanup"], workload_dir)
@@ -918,7 +934,6 @@ def test_roofline_kernel_filter(binary_handler_profile_rocprof_compute):
"--device",
"0",
"--roof-only",
"--kernel-names",
]
workload_dir = test_utils.get_output_dir()
@@ -991,7 +1006,6 @@ def test_roof_plot_modes(binary_handler_profile_rocprof_compute):
"--device",
"0",
"--roof-only",
"--kernel-names",
"--kernel",
config["kernel_name_1"],
],
@@ -1083,7 +1097,6 @@ def test_roofline_missing_file_handling(binary_handler_profile_rocprof_compute):
class MockArgs:
def __init__(self):
self.roof_only = True
self.kernel_names = False
self.mem_level = "ALL"
self.sort = "ALL"
self.roofline_data_type = ["FP32"]
@@ -1098,7 +1111,6 @@ def test_roofline_missing_file_handling(binary_handler_profile_rocprof_compute):
"device_id": 0,
"sort_type": "kernels",
"mem_level": "ALL",
"include_kernel_names": False,
"is_standalone": True,
"roofline_data_type": ["FP32"],
}
@@ -1137,7 +1149,6 @@ def test_roofline_invalid_datatype_cli(binary_handler_profile_rocprof_compute):
class MockArgs:
def __init__(self):
self.roof_only = True
self.kernel_names = False
self.mem_level = "ALL"
self.sort = "ALL"
self.roofline_data_type = ["FP32"]
@@ -1150,7 +1161,6 @@ def test_roofline_invalid_datatype_cli(binary_handler_profile_rocprof_compute):
"device_id": 0,
"sort_type": "kernels",
"mem_level": "ALL",
"include_kernel_names": False,
"is_standalone": True,
"roofline_data_type": ["FP32"],
}
@@ -1187,6 +1197,223 @@ def test_roofline_ceiling_data_validation(binary_handler_profile_rocprof_compute
test_utils.clean_output_dir(config["cleanup"], workload_dir)
@pytest.mark.roofline
def test_roofline_plot_points_data_generation():
"""
Test that plot points data structure is correctly generated with:
- Symbol assignments
- AI values (FLOPs/Byte)
- Performance values (GFLOPs/s)
- Memory/Compute bound status
- Cache level information
"""
if soc in ("MI100"):
pytest.skip("Skipping roofline test for MI100")
return
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
try:
from roofline import Roofline
from utils.specs import generate_machine_specs
class MockArgs:
def __init__(self):
self.roof_only = True
self.mem_level = "ALL"
self.sort = "ALL"
self.roofline_data_type = ["FP32"]
args = MockArgs()
mspec = generate_machine_specs(None, None)
mock_ai_data = {
"ai_l1": [[0.5, 1.2], [100.0, 150.0]],
"ai_l2": [[0.3, 0.8], [80.0, 120.0]],
"ai_hbm": [[0.1, 0.4], [50.0, 90.0]],
"kernelNames": ["kernel_A", "kernel_B"],
}
mock_ceiling_data = {
"l1": [[0.01, 10], [10, 1000], 100],
"l2": [[0.01, 10], [10, 800], 80],
"hbm": [[0.01, 10], [10, 500], 50],
"valu": [[1, 100], [200, 200], 200],
"mfma": [[1, 100], [500, 500], 500],
}
plot_points_data = []
cache_colors = {
"ai_l1": "blue",
"ai_l2": "green",
"ai_hbm": "red",
}
roofline_instance = Roofline(args, mspec)
for cache_level in ["ai_l1", "ai_l2", "ai_hbm"]:
if cache_level in mock_ai_data:
x_vals = mock_ai_data[cache_level][0]
y_vals = mock_ai_data[cache_level][1]
num_kernels = len(mock_ai_data["kernelNames"])
for i in range(min(len(x_vals), num_kernels)):
if x_vals[i] > 0 and y_vals[i] > 0:
status = roofline_instance._determine_kernel_bound_status(
ai_value=x_vals[i],
performance=y_vals[i],
cache_level=cache_level,
ceiling_data=mock_ceiling_data,
)
plot_points_data.append({
"symbol": None,
"color": cache_colors.get(cache_level, "gray"),
"cache_level": cache_level.replace("ai_", "", 1).upper(),
"ai": f"{x_vals[i]:.2f}",
"performance": f"{y_vals[i]:.2f}",
"status": status,
"kernel_idx": i,
})
assert len(plot_points_data) > 0, "Plot points data should not be empty"
for point in plot_points_data:
assert "cache_level" in point
assert "ai" in point
assert "performance" in point
assert "status" in point
assert "kernel_idx" in point
assert "color" in point
assert point["cache_level"] in ["L1", "L2", "HBM"]
assert point["status"] in ["Memory Bound", "Compute Bound", "Unknown"]
assert isinstance(point["ai"], str)
assert isinstance(point["performance"], str)
except ImportError:
pytest.skip("Could not import roofline module for direct testing")
@pytest.mark.roofline
def test_roofline_bound_status_calculation():
"""
Test _determine_kernel_bound_status() correctly classifies kernels as
Memory Bound or Compute Bound based on their AI and performance vs ceilings.
"""
if soc in ("MI100"):
pytest.skip("Skipping roofline test for MI100")
return
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
try:
from roofline import Roofline
from utils.specs import generate_machine_specs
class MockArgs:
def __init__(self):
self.roof_only = True
self.mem_level = "ALL"
self.sort = "ALL"
self.roofline_data_type = ["FP32"]
args = MockArgs()
mspec = generate_machine_specs(None, None)
roofline_instance = Roofline(args, mspec)
ceiling_data = {
"hbm": [[0.01, 10], [10, 1000], 100],
"valu": [[1, 100], [200, 200], 200],
"mfma": [[1, 100], [500, 500], 500],
}
status1 = roofline_instance._determine_kernel_bound_status(
ai_value=1.0,
performance=100.0,
cache_level="ai_hbm",
ceiling_data=ceiling_data,
)
assert status1 == "Memory Bound", f"Expected Memory Bound, got {status1}"
status2 = roofline_instance._determine_kernel_bound_status(
ai_value=5.0,
performance=150.0,
cache_level="ai_hbm",
ceiling_data=ceiling_data,
)
assert status2 == "Compute Bound", f"Expected Compute Bound, got {status2}"
status3 = roofline_instance._determine_kernel_bound_status(
ai_value=1.0,
performance=100.0,
cache_level="ai_l1",
ceiling_data=ceiling_data,
)
assert status3 == "Unknown", f"Expected Unknown, got {status3}"
bad_ceiling_data = {
"hbm": [100],
}
status4 = roofline_instance._determine_kernel_bound_status(
ai_value=1.0,
performance=100.0,
cache_level="ai_hbm",
ceiling_data=bad_ceiling_data,
)
assert status4 == "Unknown", f"Expected Unknown for bad data, got {status4}"
except ImportError:
pytest.skip("Could not import roofline module for direct testing")
@pytest.mark.roofline
def test_roofline_many_kernels_dynamic_height(binary_handler_profile_rocprof_compute):
"""
Test roofline PDF generation with many kernels (10+) to verify:
- Dynamic height calculation works
- PDF is generated successfully
- File size is reasonable
Note: This test uses a regular workload but validates the PDF structure
can handle the multi-subplot layout properly.
"""
if soc in ("MI100"):
pytest.skip("Skipping roofline test for MI100")
return
options = ["--device", "0", "--roof-only"]
workload_dir = test_utils.get_output_dir()
returncode = binary_handler_profile_rocprof_compute(
config, workload_dir, options, check_success=False, roof=True
)
assert returncode == 0, "Roofline profiling should succeed"
pdf_files = list(Path(workload_dir).glob("empirRoof_*.pdf"))
assert len(pdf_files) > 0, "At least one roofline PDF should be generated"
for pdf_file in pdf_files:
assert pdf_file.exists(), f"PDF file {pdf_file} should exist"
file_size = pdf_file.stat().st_size
# PDF should be larger than 10KB (has content) but less than 50MB (reasonable)
assert file_size > 10000, (
f"PDF {pdf_file} too small ({file_size} bytes), may be malformed"
)
assert file_size < 50000000, (
f"PDF {pdf_file} too large ({file_size} bytes), may have issues"
)
file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
assert sorted(list(file_dict.keys())) == ROOF_ONLY_FILES
test_utils.clean_output_dir(config["cleanup"], workload_dir)
@pytest.mark.misc
def test_device_filter(binary_handler_profile_rocprof_compute):
options = ["--device", "0"]
-1
查看文件
@@ -6,7 +6,6 @@ format_rocprof_output: csv
hip_trace: false
join_type: grid
kernel: null
kernel_names: false
kokkos_trace: false
list_metrics: null
loglevel: 10