Arquivos
rocm-systems/test/ext-plugins/README.md
T
Kapil S. Pawar c7f400dbff Functional Tests for Ext-Profiler Plugin (#2007)
* Add functional tests for CSV Tuner Plugin

* Updated directory structure

* Updated and renamed directories

* Updated csv conf files

* Added tests for ext-profiler

* Updated readme

* Updated readme
2025-11-18 11:20:39 -06:00

6.7 KiB

RCCL Plugin Tests

Description

This directory contains automated tests for RCCL (ROCm Communication Collectives Library) plugins:

  1. CSV Tuner Plugin: Validates the functionality of the CSV-based tuning plugin across different collective operations (AllReduce, Broadcast, Reduce, AllGather, and ReduceScatter) and various configuration scenarios.

  2. Profiler Plugin: Validates the profiler plugin that captures detailed runtime events for collective and P2P operations, including Group, Collective, P2P, ProxyOp, ProxyStep, and GPU kernel events.

The tests are written in Python using the pytest framework, making it easy to run, maintain, and extend the test coverage.

Directory Structure

ext-plugins/
├── README.md                    # This file - documentation for the test suite
├── requirements.txt             # Python dependencies required for testing
├── pytest.ini                   # Pytest configuration and test markers
├── .gitignore                   # Git ignore rules for Python/pytest artifacts
├── venv/                        # Python virtual environment (created after setup)
├── logs/                        # Test execution logs and output files
├── profiler_dumps/              # Profiler plugin output (JSON trace files)
├── assets/                      # Test configuration files and assets
│   └── csv_confs/               # CSV configuration files for testing
│       ├── incorrect_values_config.conf
│       ├── multinode_config.conf
│       ├── no_matching_config.conf
│       ├── singlenode_config.conf
│       ├── unsupported_algo_proto_config.conf
│       ├── valid_config_with_wildcards.conf
│       └── valid_config_without_wildcards.conf
└── tests/                       # Test suite directory
    ├── conftest.py              # Pytest fixtures and shared test configuration
    ├── ext-tuner/               # CSV Tuner Plugin specific tests
    │   ├── test_allreduce.py
    │   ├── test_broadcast.py
    │   ├── test_reduce.py
    │   ├── test_allgather.py
    │   └── test_reducescatter.py
    └── ext-profiler/            # Profiler Plugin specific tests
        ├── test_allreduce.py
        ├── test_broadcast.py
        ├── test_reduce.py
        ├── test_allgather.py
        ├── test_alltoall.py
        ├── test_reducescatter.py
        └── test_sendrecv.py

Installation & Setup

Prerequisites

  • Python 3.6 or higher
  • RCCL library installed
  • ROCm environment configured
  • Important: Update the installation paths in tests/conftest.py to match your environment:
RCCL_INSTALL_DIR = "path/to/rccl"
OMPI_INSTALL_DIR = "path/to/ompi"
RCCL_TESTS_DIR = "path/to/rccl-tests"

Replace these placeholder paths with your actual installation directories before running the tests.

Building the RCCL Plugins

Before running the tests, you need to build the RCCL plugin libraries.

Building the CSV Tuner Plugin

The CSV tuner plugin is located in the ext-tuner/example directory.

Step 1: Navigate to the plugin directory

cd rccl/ext-tuner/example

Step 2: Build the plugin

make

This will compile the plugin and create libnccl-tuner-example.so in the same directory.

Building the Profiler Plugin

The profiler plugin is located in the ext-plugins/example directory.

Step 1: Navigate to the plugin directory

cd rccl/test/ext-plugins/example

Step 2: Build the plugin

make

This will compile the plugin and create libnccl-profiler.so in the same directory. The profiler plugin captures detailed runtime events including:

  • Group events: High-level operation grouping
  • Collective events: AllReduce, Broadcast, Reduce, ReduceScatter operations
  • P2P events: Send, Recv, AllGather, AllToAll operations
  • ProxyOp events: Network proxy operations (ScheduleSend, ScheduleRecv, ProgressSend, ProgressRecv)
  • ProxyStep events: Detailed network steps (SendWait, RecvWait, GPU waits)
  • ProxyCtrl events: Proxy thread control (Append, Sleep)
  • GPU events: Kernel channel execution

Step 1: Create Virtual Environment

Create a Python virtual environment to isolate the test dependencies:

python3 -m venv venv

Step 2: Activate Virtual Environment

Activate the virtual environment using the appropriate command for your shell:

On Linux/Mac (bash/zsh):

source venv/bin/activate

Once activated, you should see (venv) at the beginning of your command prompt.

Step 3: Install Dependencies

Install the required Python packages:

pip install -r requirements.txt

Running Tests

Run All Tests

To run the entire test suite:

pytest --cache-clear

Run Tests with Verbose Output

For more detailed test output:

pytest -v --cache-clear

Run Tests by Marker

Tests are organized using pytest markers. You can run specific groups of tests:

Run plugin-specific tests:

pytest -m ext_tuner --cache-clear      # CSV Tuner Plugin tests only
pytest -m ext_profiler --cache-clear   # Profiler Plugin tests only

Run tests for specific collective operations (across all plugins):

pytest -m allreduce --cache-clear      # AllReduce tests
pytest -m broadcast --cache-clear      # Broadcast tests
pytest -m reduce --cache-clear         # Reduce tests
pytest -m allgather --cache-clear      # AllGather tests
pytest -m reducescatter --cache-clear  # ReduceScatter tests
pytest -m alltoall --cache-clear       # AllToAll tests (profiler only)
pytest -m sendrecv --cache-clear       # SendRecv tests (profiler only)

Combine markers to run specific tests:

pytest -m "ext_profiler and allreduce" --cache-clear   # Profiler AllReduce tests only
pytest -m "ext_tuner and broadcast" --cache-clear      # Tuner Broadcast tests only

Run Tests with Log Output

To see live log output during test execution:

pytest -v -s --cache-clear

Generate Test Report

To generate a detailed test report:

pytest --verbose --tb=short

Additional Notes

  • Deactivating Virtual Environment: When you're done testing, deactivate the virtual environment by running:

    deactivate
    
  • Log Files: Test execution logs are stored in the logs/ directory for later review and debugging.

  • Profiler Output: Profiler plugin tests generate JSON trace files in the profiler_dumps/ directory. These files contain detailed event traces that can be analyzed for debugging or performance analysis. The directory is automatically cleaned before each test session by the pytest fixture.