Dateien
rocm-systems/ext-tuner/basic
nawrinsu 166268d715 Fix protocol and channel override when tuner is used (#1985)
* Fix protocol and channel override when tuner is used

* Added comment

* Fix README for basic tuner implementation
2025-11-03 13:56:34 -08:00
..
2025-06-18 10:34:47 -07:00

Basic NCCL Tuner Plugin

This directory contains a minimal placeholder implementation of an NCCL tuner plugin. It serves as a starting point for developing custom tuner plugins by providing the essential function stubs and interface structure required by NCCL.

Purpose

This basic plugin is designed to:

  • Provide a minimal working example of the NCCL tuner plugin interface
  • Serve as a template for developing custom tuner plugins
  • Demonstrate the required function signatures and structure
  • Implement placeholder functionality that can be extended

Implementation Details

The plugin implements the following functions:

pluginInit

ncclResult_t pluginInit(size_t nRanks, size_t nNodes, ncclDebugLogger_t logFunction, void **context)
  • Purpose: Initialize the plugin with communicator information
  • Current Implementation: Simple placeholder that returns success
  • Parameters:
    • nRanks: Total number of ranks in the communicator
    • nNodes: Total number of nodes in the communicator
    • logFunction: NCCL debug logging function
    • context: Plugin context pointer (output)

pluginGetCollInfo

ncclResult_t pluginGetCollInfo(void* context, ncclFunc_t collType, size_t nBytes,
                              int numPipeOps, float** collCostTable, int numAlgo, int numProto,
                              int regBuff, int* nChannels)
  • Purpose: Modify cost tables for collective operations
  • Current Implementation:
    • Sets RING+SIMPLE algorithm to cost 0.0 (highest preference)
    • Sets channel count to 0
  • Parameters:
    • context: Plugin context from init
    • collType: Type of collective operation
    • nBytes: Message size in bytes
    • numPipeOps: Number of pipeline operations
    • collCostTable: Cost table to modify
    • numAlgo: Number of algorithms
    • numProto: Number of protocols
    • regBuff: Whether buffer can be registered
    • nChannels: Number of channels to use (output)

pluginDestroy

ncclResult_t pluginDestroy(void* context)
  • Purpose: Clean up plugin resources
  • Current Implementation: Simple placeholder that returns success

Cost Table Structure

The plugin demonstrates how to modify NCCL's cost tables:

float (*table)[NCCL_NUM_PROTOCOLS] = (float (*)[NCCL_NUM_PROTOCOLS])collCostTable;

The cost table is a 2D array where:

  • First dimension: Algorithm index (e.g., NCCL_ALGO_RING)
  • Second dimension: Protocol index (e.g., NCCL_PROTO_SIMPLE)
  • Values: Cost for that algorithm/protocol combination

Cost Values

  • 0.0: Highest preference (lowest cost)
  • Positive values: Relative costs (lower is better)
  • NCCL_ALGO_PROTO_IGNORE: Disable this combination

Building

make

This creates libnccl-tuner-basic.so which can be loaded by NCCL.

Usage

Loading the Plugin

export LD_LIBRARY_PATH=/path/to/basic:$LD_LIBRARY_PATH
mpirun -np 4 your_nccl_application
export NCCL_TUNER_PLUGIN=basic
export NCCL_TUNER_PLUGIN=libnccl-tuner-basic.so
export NCCL_TUNER_PLUGIN=/path/to/your/plugin/libnccl-tuner-basic.so

Verifying Plugin Loading

Enable NCCL debug output to see if the plugin is loaded:

export NCCL_DEBUG=INFO

You should see messages indicating the tuner plugin is being used.

Extending the Plugin

This basic plugin provides a foundation that you can extend:

1. Add Configuration Logic

Modify pluginGetCollInfo to implement your tuning strategy:

__hidden ncclResult_t pluginGetCollInfo(void* context, ncclFunc_t collType, size_t nBytes,
                              int numPipeOps, float** collCostTable, int numAlgo, int numProto,
                              int regBuff, int* nChannels) {
  // Your custom tuning logic here
  if (nBytes < 1024) {
    // Small message optimization
    table[NCCL_ALGO_TREE][NCCL_PROTO_SIMPLE] = 0.0;
  } else {
    // Large message optimization
    table[NCCL_ALGO_RING][NCCL_PROTO_LL128] = 0.0;
  }

  // Dynamic channel selection
  *nChannels = (nBytes > 1024*1024) ? 4 : 1;

  return ncclSuccess;
}

2. Add Context Management

Use the context pointer to store plugin state:

struct pluginContext {
  int initialized;
  size_t nRanks;
  size_t nNodes;
  // Add your plugin-specific data here
};

3. Add File-Based Configuration

Read configuration from files, environment variables, or other sources.

4. Add Topology Awareness

Use the nRanks and nNodes parameters to implement topology-specific tuning.

File Structure

basic/
├── README.md          # This file
├── plugin.c           # Plugin implementation
├── Makefile           # Build configuration
└── nccl/              # NCCL header files
    └── tuner.h        # Tuner plugin interface definitions

Next Steps

  1. Understand the Interface: Study the function signatures and parameters
  2. Implement Your Logic: Add your tuning strategy to pluginGetCollInfo
  3. Test Thoroughly: Verify your plugin works with different message sizes and topologies
  4. Add Error Handling: Implement proper error checking and resource management
  5. Document Your Changes: Update this README with your specific implementation details

Comparison with Example Plugin

  • Basic Plugin: Minimal implementation, good for learning and simple use cases
  • Example Plugin: Full-featured CSV-based configuration system, good for production use

Choose the basic plugin if you want to:

  • Learn the tuner plugin interface
  • Implement simple, hardcoded tuning strategies
  • Build a custom plugin from scratch

Choose the example plugin if you want:

  • File-based configuration
  • Complex tuning strategies
  • Production-ready features

Resources

This basic plugin provides the foundation you need to start developing custom NCCL tuner plugins. Extend it with your specific tuning logic and requirements.