* Add functional tests for CSV Tuner Plugin
* Updated directory structure
* Updated and renamed directories
* Updated csv conf files
* Added tests for ext-profiler
* Updated readme
* Updated readme
[ROCm/rccl commit: c7f400dbff]
RCCL Plugin Tests
Description
This directory contains automated tests for RCCL (ROCm Communication Collectives Library) plugins:
-
CSV Tuner Plugin: Validates the functionality of the CSV-based tuning plugin across different collective operations (AllReduce, Broadcast, Reduce, AllGather, and ReduceScatter) and various configuration scenarios.
-
Profiler Plugin: Validates the profiler plugin that captures detailed runtime events for collective and P2P operations, including Group, Collective, P2P, ProxyOp, ProxyStep, and GPU kernel events.
The tests are written in Python using the pytest framework, making it easy to run, maintain, and extend the test coverage.
Directory Structure
ext-plugins/
├── README.md # This file - documentation for the test suite
├── requirements.txt # Python dependencies required for testing
├── pytest.ini # Pytest configuration and test markers
├── .gitignore # Git ignore rules for Python/pytest artifacts
├── venv/ # Python virtual environment (created after setup)
├── logs/ # Test execution logs and output files
├── profiler_dumps/ # Profiler plugin output (JSON trace files)
├── assets/ # Test configuration files and assets
│ └── csv_confs/ # CSV configuration files for testing
│ ├── incorrect_values_config.conf
│ ├── multinode_config.conf
│ ├── no_matching_config.conf
│ ├── singlenode_config.conf
│ ├── unsupported_algo_proto_config.conf
│ ├── valid_config_with_wildcards.conf
│ └── valid_config_without_wildcards.conf
└── tests/ # Test suite directory
├── conftest.py # Pytest fixtures and shared test configuration
├── ext-tuner/ # CSV Tuner Plugin specific tests
│ ├── test_allreduce.py
│ ├── test_broadcast.py
│ ├── test_reduce.py
│ ├── test_allgather.py
│ └── test_reducescatter.py
└── ext-profiler/ # Profiler Plugin specific tests
├── test_allreduce.py
├── test_broadcast.py
├── test_reduce.py
├── test_allgather.py
├── test_alltoall.py
├── test_reducescatter.py
└── test_sendrecv.py
Installation & Setup
Prerequisites
- Python 3.6 or higher
- RCCL library installed
- ROCm environment configured
- Important: Update the installation paths in
tests/conftest.pyto match your environment:
RCCL_INSTALL_DIR = "path/to/rccl"
OMPI_INSTALL_DIR = "path/to/ompi"
RCCL_TESTS_DIR = "path/to/rccl-tests"
Replace these placeholder paths with your actual installation directories before running the tests.
Building the RCCL Plugins
Before running the tests, you need to build the RCCL plugin libraries.
Building the CSV Tuner Plugin
The CSV tuner plugin is located in the ext-tuner/example directory.
Step 1: Navigate to the plugin directory
cd rccl/ext-tuner/example
Step 2: Build the plugin
make
This will compile the plugin and create libnccl-tuner-example.so in the same directory.
Building the Profiler Plugin
The profiler plugin is located in the ext-plugins/example directory.
Step 1: Navigate to the plugin directory
cd rccl/test/ext-plugins/example
Step 2: Build the plugin
make
This will compile the plugin and create libnccl-profiler.so in the same directory. The profiler plugin captures detailed runtime events including:
- Group events: High-level operation grouping
- Collective events: AllReduce, Broadcast, Reduce, ReduceScatter operations
- P2P events: Send, Recv, AllGather, AllToAll operations
- ProxyOp events: Network proxy operations (ScheduleSend, ScheduleRecv, ProgressSend, ProgressRecv)
- ProxyStep events: Detailed network steps (SendWait, RecvWait, GPU waits)
- ProxyCtrl events: Proxy thread control (Append, Sleep)
- GPU events: Kernel channel execution
Step 1: Create Virtual Environment
Create a Python virtual environment to isolate the test dependencies:
python3 -m venv venv
Step 2: Activate Virtual Environment
Activate the virtual environment using the appropriate command for your shell:
On Linux/Mac (bash/zsh):
source venv/bin/activate
Once activated, you should see (venv) at the beginning of your command prompt.
Step 3: Install Dependencies
Install the required Python packages:
pip install -r requirements.txt
Running Tests
Run All Tests
To run the entire test suite:
pytest --cache-clear
Run Tests with Verbose Output
For more detailed test output:
pytest -v --cache-clear
Run Tests by Marker
Tests are organized using pytest markers. You can run specific groups of tests:
Run plugin-specific tests:
pytest -m ext_tuner --cache-clear # CSV Tuner Plugin tests only
pytest -m ext_profiler --cache-clear # Profiler Plugin tests only
Run tests for specific collective operations (across all plugins):
pytest -m allreduce --cache-clear # AllReduce tests
pytest -m broadcast --cache-clear # Broadcast tests
pytest -m reduce --cache-clear # Reduce tests
pytest -m allgather --cache-clear # AllGather tests
pytest -m reducescatter --cache-clear # ReduceScatter tests
pytest -m alltoall --cache-clear # AllToAll tests (profiler only)
pytest -m sendrecv --cache-clear # SendRecv tests (profiler only)
Combine markers to run specific tests:
pytest -m "ext_profiler and allreduce" --cache-clear # Profiler AllReduce tests only
pytest -m "ext_tuner and broadcast" --cache-clear # Tuner Broadcast tests only
Run Tests with Log Output
To see live log output during test execution:
pytest -v -s --cache-clear
Generate Test Report
To generate a detailed test report:
pytest --verbose --tb=short
Additional Notes
-
Deactivating Virtual Environment: When you're done testing, deactivate the virtual environment by running:
deactivate -
Log Files: Test execution logs are stored in the
logs/directory for later review and debugging. -
Profiler Output: Profiler plugin tests generate JSON trace files in the
profiler_dumps/directory. These files contain detailed event traces that can be analyzed for debugging or performance analysis. The directory is automatically cleaned before each test session by the pytest fixture.