diff --git a/projects/rocprofiler-compute/docs/archive/docs-1.x/profiling.md b/projects/rocprofiler-compute/docs/archive/docs-1.x/profiling.md index d3e6831a00..61827add37 100644 --- a/projects/rocprofiler-compute/docs/archive/docs-1.x/profiling.md +++ b/projects/rocprofiler-compute/docs/archive/docs-1.x/profiling.md @@ -105,7 +105,6 @@ Standalone Roofline Options: vL1D LDS --device GPU device ID. (DEFAULT: ALL) - --kernel-names Include kernel names in roofline plot. ``` The following sample command profiles the *vcopy* workload. @@ -377,7 +376,7 @@ Standalone Roofline Options: - The `--device` \ allows you to specify a device id to collect performace data from when running our roofline benchmark on your system. -- If you'd like to distinguish different kernels in your .pdf roofline plot use `--kernel-names`. This will give each kernel a unique marker identifiable from the plot's key. +- Each kernel in your .pdf roofline plot is automatically distinguished with a unique marker identifiable from the plot's key. #### Roofline Only diff --git a/projects/rocprofiler-compute/docs/archive/docs-2.x/profiling.md b/projects/rocprofiler-compute/docs/archive/docs-2.x/profiling.md index 9c699ec7a9..f79a055b24 100644 --- a/projects/rocprofiler-compute/docs/archive/docs-2.x/profiling.md +++ b/projects/rocprofiler-compute/docs/archive/docs-2.x/profiling.md @@ -315,7 +315,7 @@ Standalone Roofline Options: - The `--device` \ allows you to specify a device id to collect performance data from when running our roofline benchmark on your system. -- If you would like to distinguish different kernels in your .pdf roofline plot use `--kernel-names`. This will give each kernel a unique marker identifiable from the plot's key. +- Each kernel in your .pdf roofline plot is automatically distinguished with a unique marker identifiable from the plot's key. #### Roofline Only diff --git a/projects/rocprofiler-compute/docs/data/profile/sample-roof-plot.jpg b/projects/rocprofiler-compute/docs/data/profile/sample-roof-plot.jpg index 2deaba7ad2..7279c77d62 100644 Binary files a/projects/rocprofiler-compute/docs/data/profile/sample-roof-plot.jpg and b/projects/rocprofiler-compute/docs/data/profile/sample-roof-plot.jpg differ diff --git a/projects/rocprofiler-compute/docs/how-to/profile/mode.rst b/projects/rocprofiler-compute/docs/how-to/profile/mode.rst index d46005001b..4d08311e82 100644 --- a/projects/rocprofiler-compute/docs/how-to/profile/mode.rst +++ b/projects/rocprofiler-compute/docs/how-to/profile/mode.rst @@ -583,9 +583,11 @@ Roofline options For more information on data types supported based on the GPU architecture, see :doc:`../../conceptual/performance-model` -To distinguish different kernels in your ``.pdf`` roofline plot use -``--kernel-names``. This will give each kernel a unique marker identifiable from -the plot's key. +Each kernel in your ``.pdf`` roofline plot is automatically distinguished with a unique marker identifiable from the plot's key. The roofline PDF includes an integrated multi-subplot layout with: + +1. **Roofline Plot** - Shows performance ceilings and kernel arithmetic intensity points +2. **Plot Points & Values Table** - Displays AI values, performance metrics, memory/compute bound status, and cache levels for each kernel +3. **Full Kernel Names Table** - Lists complete kernel names with their corresponding plot markers Roofline only @@ -595,33 +597,52 @@ The following example demonstrates profiling roofline data only: .. code-block:: shell-session - $ rocprof-compute profile --name vcopy --roof-only -- ./vcopy -n 1048576 -b 256 - + $ rocprof-compute profile --name occupancy --roof-only -- ./tests/occupancy -n 1048576 -b 256 + __ _ + _ __ ___ ___ _ __ _ __ ___ / _| ___ ___ _ __ ___ _ __ _ _| |_ ___ + | '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \ + | | | (_) | (__| |_) | | | (_) | _|_____| (_| (_) | | | | | | |_) | |_| | || __/ + |_| \___/ \___| .__/|_| \___/|_| \___\___/|_| |_| |_| .__/ \__,_|\__\___| + |_| |_| ... - [roofline] Checking for roofline.csv in /home/auser/repos/rocprofiler-compute/sample/workloads/vcopy/MI200 - [roofline] No roofline data found. Generating... - Checking for roofline.csv in /home/auser/repos/rocprofiler-compute/sample/workloads/vcopy/MI200 + INFO [roofline] Generating pmc_perf.csv (roofline counters only). + INFO Rocprofiler-Compute version: 3.3.0 + INFO Profiler choice: rocprofiler-sdk + INFO Path: /app/projects/rocprofiler-compute/workloads/occupancy/MI300X_A1 + INFO Target: MI300X_A1 + INFO Command: ./tests/occupancy -n 1048576 -b 256 + INFO Kernel Selection: None + INFO Dispatch Selection: None + INFO Filtered sections: ['4'] + INFO + INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + INFO Collecting Performance Counters (Roofline Only) + INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + INFO + INFO [Run 1/3][Approximate profiling time left: pending first measurement...] + INFO [profiling] Current input file: /app/projects/rocprofiler-compute/workloads/occupancy/MI300X_A1/perfmon/pmc_perf_0.txt + ... + INFO [roofline] Checking for roofline.csv in /app/projects/rocprofiler-compute/workloads/occupancy/MI300X_A1 + INFO [roofline] No roofline data found. Generating... Empirical Roofline Calculation - Copyright © 2022 Advanced Micro Devices, Inc. All rights reserved. - Total detected GPU devices: 4 - GPU Device 0: Profiling... - 99% [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ] - ... - Empirical Roofline PDFs saved! - + Copyright © 2025 Advanced Micro Devices, Inc. All rights reserved. + Total detected GPU devices: 8 + GPU Device 0 (gfx942) with 304 CUs: Profiling... + 99% [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ] + ... An inspection of our workload output folder shows ``.pdf`` plots were generated successfully. .. code-block:: shell-session - $ ls workloads/vcopy/MI200/ + $ ls workloads/occupancy/MI300X_A1 total 48 - -rw-r--r-- 1 auser agroup 13331 Mar 1 16:05 empirRoof_gpu-0_FP32.pdf - drwxr-xr-x 1 auser agroup 0 Mar 1 16:03 perfmon - -rw-r--r-- 1 auser agroup 1101 Mar 1 16:03 pmc_perf.csv - -rw-r--r-- 1 auser agroup 1715 Mar 1 16:05 roofline.csv - -rw-r--r-- 1 auser agroup 650 Mar 1 16:03 sysinfo.csv - -rw-r--r-- 1 auser agroup 399 Mar 1 16:03 timestamps.csv + -rw-r--r-- 1 auser agroup 13331 Oct 29 10:33 empirRoof_gpu-0_FP32.pdf + drwxr-xr-x 1 auser agroup 0 Oct 29 10:33 perfmon + -rw-r--r-- 1 auser agroup 1101 Oct 29 10:33 pmc_perf.csv + -rw-r--r-- 1 auser agroup 1715 Oct 29 10:33 roofline.csv + -rw-r--r-- 1 auser agroup 650 Oct 29 10:33 sysinfo.csv + -rw-r--r-- 1 auser agroup 399 Oct 29 10:33 timestamps.csv .. note:: diff --git a/projects/rocprofiler-compute/src/argparser.py b/projects/rocprofiler-compute/src/argparser.py index 240d7c6ac6..826841e7ca 100644 --- a/projects/rocprofiler-compute/src/argparser.py +++ b/projects/rocprofiler-compute/src/argparser.py @@ -439,13 +439,6 @@ Examples: type=int, help="\t\t\tTarget GPU device ID. (DEFAULT: 0)", ) - roofline_group.add_argument( - "--kernel-names", - required=False, - default=False, - action="store_true", - help="\t\t\tInclude kernel names in roofline plot.", - ) roofline_group.add_argument( "-R", "--roofline-data-type", diff --git a/projects/rocprofiler-compute/tests/generate_workloads.sh b/projects/rocprofiler-compute/tests/generate_workloads.sh index fe2e141d61..fa9242c109 100755 --- a/projects/rocprofiler-compute/tests/generate_workloads.sh +++ b/projects/rocprofiler-compute/tests/generate_workloads.sh @@ -2,7 +2,7 @@ declare -A commands=( [path]=' ' [no_roof]='--no-roof' - [kernel_names]='--roof-only --kernel-names' + [kernel_names]='--roof-only' [device_filter]='--device 0' [kernel]='--kernel "vecCopy(double*, double*, double*, int, int) [clone .kd]"' [ipblocks_SQ]='-b SQ' diff --git a/projects/rocprofiler-compute/tests/test_profile_general.py b/projects/rocprofiler-compute/tests/test_profile_general.py index 1150b16a92..758b2e9e5e 100644 --- a/projects/rocprofiler-compute/tests/test_profile_general.py +++ b/projects/rocprofiler-compute/tests/test_profile_general.py @@ -587,14 +587,21 @@ def test_path_rocpd( @pytest.mark.roofline -def test_roof_kernel_names(binary_handler_profile_rocprof_compute): +def test_roof_basic_validation(binary_handler_profile_rocprof_compute): + """ + Test basic roofline PDF generation with full validation pipeline. + This test runs the complete validation flow including counter logging + and metric comparison (if enabled in config). Validates that roofline PDFs + are generated with the integrated multi-subplot layout (roofline plot + + plot points table + kernel names table). + """ if soc in ("MI100"): # roofline is not supported on MI100 assert True # Do not continue testing return - options = ["--device", "0", "--roof-only", "--kernel-names"] + options = ["--device", "0", "--roof-only"] workload_dir = test_utils.get_output_dir() returncode = binary_handler_profile_rocprof_compute( config, workload_dir, options, check_success=False, roof=True @@ -631,7 +638,6 @@ def test_roof_multiple_data_types(binary_handler_profile_rocprof_compute): "--device", "0", "--roof-only", - "--kernel-names", "--roofline-data-type", dtype, ] @@ -666,7 +672,6 @@ def test_roof_invalid_data_type(binary_handler_profile_rocprof_compute): "--device", "0", "--roof-only", - "--kernel-names", "--roofline-data-type", "INVALID_TYPE", ] @@ -815,7 +820,6 @@ def test_roofline_workload_dir_not_set_error(): class MockArgs: def __init__(self): self.roof_only = True - self.kernel_names = False self.mem_level = "ALL" self.sort = "ALL" self.roofline_data_type = ["FP32"] @@ -828,7 +832,6 @@ def test_roofline_workload_dir_not_set_error(): "device_id": 0, "sort_type": "kernels", "mem_level": "ALL", - "include_kernel_names": False, "is_standalone": True, "roofline_data_type": ["FP32"], } @@ -879,8 +882,15 @@ def test_roof_workload_dir_validation(binary_handler_profile_rocprof_compute): @pytest.mark.roofline def test_roofline_empty_kernel_names_handling(binary_handler_profile_rocprof_compute): """ - Test empirical_roofline() when num_kernels == 0 - This should trigger the "No kernel names found" log message + Test roofline behavior when kernel filter doesn't match any + kernels during initial profiling. + + When profiling with a non-matching kernel filter, no counter + data is collected, so roofline generation is skipped with a + warning (but returns success code 0). + + This is different from filtering existing profiling data with + a non-matching kernel name, which produces an explicit error. """ if soc in ("MI100"): pytest.skip("Skipping roofline test for MI100") @@ -890,14 +900,20 @@ def test_roofline_empty_kernel_names_handling(binary_handler_profile_rocprof_com "--device", "0", "--roof-only", - "--kernel-names", "--kernel", "nonexistent_kernel_name_that_should_not_match_anything", ] workload_dir = test_utils.get_output_dir() - returncode = binary_handler_profile_rocprof_compute( # noqa: F841 - config, workload_dir, options, check_success=True, roof=True + returncode = binary_handler_profile_rocprof_compute( + config, workload_dir, options, check_success=False, roof=True + ) + + assert returncode == 0, f"Expected success (returncode=0), got {returncode}" + + pdf_files = list(Path(workload_dir).glob("empirRoof_*.pdf")) + assert len(pdf_files) == 0, ( + "No roofline PDF should be generated when no kernels match" ) test_utils.clean_output_dir(config["cleanup"], workload_dir) @@ -918,7 +934,6 @@ def test_roofline_kernel_filter(binary_handler_profile_rocprof_compute): "--device", "0", "--roof-only", - "--kernel-names", ] workload_dir = test_utils.get_output_dir() @@ -991,7 +1006,6 @@ def test_roof_plot_modes(binary_handler_profile_rocprof_compute): "--device", "0", "--roof-only", - "--kernel-names", "--kernel", config["kernel_name_1"], ], @@ -1083,7 +1097,6 @@ def test_roofline_missing_file_handling(binary_handler_profile_rocprof_compute): class MockArgs: def __init__(self): self.roof_only = True - self.kernel_names = False self.mem_level = "ALL" self.sort = "ALL" self.roofline_data_type = ["FP32"] @@ -1098,7 +1111,6 @@ def test_roofline_missing_file_handling(binary_handler_profile_rocprof_compute): "device_id": 0, "sort_type": "kernels", "mem_level": "ALL", - "include_kernel_names": False, "is_standalone": True, "roofline_data_type": ["FP32"], } @@ -1137,7 +1149,6 @@ def test_roofline_invalid_datatype_cli(binary_handler_profile_rocprof_compute): class MockArgs: def __init__(self): self.roof_only = True - self.kernel_names = False self.mem_level = "ALL" self.sort = "ALL" self.roofline_data_type = ["FP32"] @@ -1150,7 +1161,6 @@ def test_roofline_invalid_datatype_cli(binary_handler_profile_rocprof_compute): "device_id": 0, "sort_type": "kernels", "mem_level": "ALL", - "include_kernel_names": False, "is_standalone": True, "roofline_data_type": ["FP32"], } @@ -1187,6 +1197,223 @@ def test_roofline_ceiling_data_validation(binary_handler_profile_rocprof_compute test_utils.clean_output_dir(config["cleanup"], workload_dir) +@pytest.mark.roofline +def test_roofline_plot_points_data_generation(): + """ + Test that plot points data structure is correctly generated with: + - Symbol assignments + - AI values (FLOPs/Byte) + - Performance values (GFLOPs/s) + - Memory/Compute bound status + - Cache level information + """ + if soc in ("MI100"): + pytest.skip("Skipping roofline test for MI100") + return + + sys.path.insert(0, str(Path(__file__).parent.parent / "src")) + + try: + from roofline import Roofline + from utils.specs import generate_machine_specs + + class MockArgs: + def __init__(self): + self.roof_only = True + self.mem_level = "ALL" + self.sort = "ALL" + self.roofline_data_type = ["FP32"] + + args = MockArgs() + mspec = generate_machine_specs(None, None) + + mock_ai_data = { + "ai_l1": [[0.5, 1.2], [100.0, 150.0]], + "ai_l2": [[0.3, 0.8], [80.0, 120.0]], + "ai_hbm": [[0.1, 0.4], [50.0, 90.0]], + "kernelNames": ["kernel_A", "kernel_B"], + } + + mock_ceiling_data = { + "l1": [[0.01, 10], [10, 1000], 100], + "l2": [[0.01, 10], [10, 800], 80], + "hbm": [[0.01, 10], [10, 500], 50], + "valu": [[1, 100], [200, 200], 200], + "mfma": [[1, 100], [500, 500], 500], + } + + plot_points_data = [] + cache_colors = { + "ai_l1": "blue", + "ai_l2": "green", + "ai_hbm": "red", + } + + roofline_instance = Roofline(args, mspec) + + for cache_level in ["ai_l1", "ai_l2", "ai_hbm"]: + if cache_level in mock_ai_data: + x_vals = mock_ai_data[cache_level][0] + y_vals = mock_ai_data[cache_level][1] + num_kernels = len(mock_ai_data["kernelNames"]) + + for i in range(min(len(x_vals), num_kernels)): + if x_vals[i] > 0 and y_vals[i] > 0: + status = roofline_instance._determine_kernel_bound_status( + ai_value=x_vals[i], + performance=y_vals[i], + cache_level=cache_level, + ceiling_data=mock_ceiling_data, + ) + + plot_points_data.append({ + "symbol": None, + "color": cache_colors.get(cache_level, "gray"), + "cache_level": cache_level.replace("ai_", "", 1).upper(), + "ai": f"{x_vals[i]:.2f}", + "performance": f"{y_vals[i]:.2f}", + "status": status, + "kernel_idx": i, + }) + + assert len(plot_points_data) > 0, "Plot points data should not be empty" + + for point in plot_points_data: + assert "cache_level" in point + assert "ai" in point + assert "performance" in point + assert "status" in point + assert "kernel_idx" in point + assert "color" in point + + assert point["cache_level"] in ["L1", "L2", "HBM"] + + assert point["status"] in ["Memory Bound", "Compute Bound", "Unknown"] + + assert isinstance(point["ai"], str) + assert isinstance(point["performance"], str) + + except ImportError: + pytest.skip("Could not import roofline module for direct testing") + + +@pytest.mark.roofline +def test_roofline_bound_status_calculation(): + """ + Test _determine_kernel_bound_status() correctly classifies kernels as + Memory Bound or Compute Bound based on their AI and performance vs ceilings. + """ + if soc in ("MI100"): + pytest.skip("Skipping roofline test for MI100") + return + + sys.path.insert(0, str(Path(__file__).parent.parent / "src")) + + try: + from roofline import Roofline + from utils.specs import generate_machine_specs + + class MockArgs: + def __init__(self): + self.roof_only = True + self.mem_level = "ALL" + self.sort = "ALL" + self.roofline_data_type = ["FP32"] + + args = MockArgs() + mspec = generate_machine_specs(None, None) + roofline_instance = Roofline(args, mspec) + + ceiling_data = { + "hbm": [[0.01, 10], [10, 1000], 100], + "valu": [[1, 100], [200, 200], 200], + "mfma": [[1, 100], [500, 500], 500], + } + + status1 = roofline_instance._determine_kernel_bound_status( + ai_value=1.0, + performance=100.0, + cache_level="ai_hbm", + ceiling_data=ceiling_data, + ) + assert status1 == "Memory Bound", f"Expected Memory Bound, got {status1}" + + status2 = roofline_instance._determine_kernel_bound_status( + ai_value=5.0, + performance=150.0, + cache_level="ai_hbm", + ceiling_data=ceiling_data, + ) + assert status2 == "Compute Bound", f"Expected Compute Bound, got {status2}" + + status3 = roofline_instance._determine_kernel_bound_status( + ai_value=1.0, + performance=100.0, + cache_level="ai_l1", + ceiling_data=ceiling_data, + ) + assert status3 == "Unknown", f"Expected Unknown, got {status3}" + + bad_ceiling_data = { + "hbm": [100], + } + status4 = roofline_instance._determine_kernel_bound_status( + ai_value=1.0, + performance=100.0, + cache_level="ai_hbm", + ceiling_data=bad_ceiling_data, + ) + assert status4 == "Unknown", f"Expected Unknown for bad data, got {status4}" + + except ImportError: + pytest.skip("Could not import roofline module for direct testing") + + +@pytest.mark.roofline +def test_roofline_many_kernels_dynamic_height(binary_handler_profile_rocprof_compute): + """ + Test roofline PDF generation with many kernels (10+) to verify: + - Dynamic height calculation works + - PDF is generated successfully + - File size is reasonable + + Note: This test uses a regular workload but validates the PDF structure + can handle the multi-subplot layout properly. + """ + if soc in ("MI100"): + pytest.skip("Skipping roofline test for MI100") + return + + options = ["--device", "0", "--roof-only"] + workload_dir = test_utils.get_output_dir() + + returncode = binary_handler_profile_rocprof_compute( + config, workload_dir, options, check_success=False, roof=True + ) + + assert returncode == 0, "Roofline profiling should succeed" + + pdf_files = list(Path(workload_dir).glob("empirRoof_*.pdf")) + assert len(pdf_files) > 0, "At least one roofline PDF should be generated" + + for pdf_file in pdf_files: + assert pdf_file.exists(), f"PDF file {pdf_file} should exist" + file_size = pdf_file.stat().st_size + + # PDF should be larger than 10KB (has content) but less than 50MB (reasonable) + assert file_size > 10000, ( + f"PDF {pdf_file} too small ({file_size} bytes), may be malformed" + ) + assert file_size < 50000000, ( + f"PDF {pdf_file} too large ({file_size} bytes), may have issues" + ) + + file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels) + assert sorted(list(file_dict.keys())) == ROOF_ONLY_FILES + + test_utils.clean_output_dir(config["cleanup"], workload_dir) + + @pytest.mark.misc def test_device_filter(binary_handler_profile_rocprof_compute): options = ["--device", "0"] diff --git a/projects/rocprofiler-compute/tests/workloads/vcopy/MI350/profiling_config.yaml b/projects/rocprofiler-compute/tests/workloads/vcopy/MI350/profiling_config.yaml index 745cc95955..fb20e78a22 100644 --- a/projects/rocprofiler-compute/tests/workloads/vcopy/MI350/profiling_config.yaml +++ b/projects/rocprofiler-compute/tests/workloads/vcopy/MI350/profiling_config.yaml @@ -6,7 +6,6 @@ format_rocprof_output: csv hip_trace: false join_type: grid kernel: null -kernel_names: false kokkos_trace: false list_metrics: null loglevel: 10