This directory contains resources and examples for developing NCCL tuner plugins. Tuner plugins allow you to customize NCCL's algorithm and protocol selection behavior to optimize performance for specific workloads and hardware configurations.
## Overview
NCCL tuner plugins provide a way to influence NCCL's automatic algorithm and protocol selection by modifying the cost tables that NCCL uses to make decisions. This allows you to:
- Override default algorithm/protocol combinations for specific collective operations
- Customize tuning based on message size, topology, and other parameters
- Implement sophisticated tuning strategies without recompiling NCCL
- Optimize performance for specific hardware configurations or workloads
## Tuner Plugin Interface
NCCL tuner plugins must implement the `ncclTuner_t` interface defined in `nccl_tuner.h` within `nccl/src/include/plugin`. These definitions have been forked to `tuner.h` in each example plugin, and it is expected that any plugin implementor forks the internal NCCL definitions as well. The current interface includes:
- Include the necessary forked NCCL headers (`tuner.h`)
- Implement all required interface functions
- Export the plugin structure with appropriate version
- Handle all input parameters gracefully
### 2. Cost Table Modification
The `getCollInfo` function receives a cost table that maps algorithm/protocol combinations to performance costs. Lower costs indicate preferred combinations. You can:
- Set costs to `0.0` to make combinations highly preferred
- Set costs to `NCCL_ALGO_PROTO_IGNORE` to disable combinations
- Use relative costs to create preferences between options
### 3. Channel Management
The `nChannels` parameter allows you to:
- Set a specific number of channels to use
- Return the original value to preserve NCCL's default behavior
- Implement dynamic channel selection based on message size or topology
### 4. Error Handling
Always return appropriate `ncclResult_t` values:
-`ncclSuccess` for successful or ignored operations
-`ncclInternalError` for plugin-specific errors. Returning an error is only advisable on plugin initialization and destruction, as the penalty users can pay for the overhead of a failed plugin call can be immense.
- Other NCCL error codes as appropriate
## Getting Started
### Option 1: Start with the Example Plugin
If you're new to tuner plugin development, start with the `example/` directory:
```bash
cd example/
make
```
This provides a CSV-based configuration system that you can customize or use as a template.
## Building and Testing
### Build Requirements
- GCC or compatible C compiler
- NCCL headers (included in `nccl/` subdirectories)
- Make
## Option 2: Use the Basic Plugin
For more customized tuning needs, you might want to start with a clean baseline. In that case, base off the basic plugin in the `basic/` directory:
```bash
cd basic/
make
```
### Build Process
Each plugin directory contains a Makefile:
```bash
cd basic/ # or example/
make
```
This generates a shared library (`.so` file) that can be loaded by NCCL.
### Loading the Plugin
Set the `LD_LIBRARY_PATH` to include your plugin directory: