2
0
Ficheiros
rocm-systems/projects/rccl/ext-profiler/inspector/README.md
T

6.5 KiB

NCCL Inspector Plugin

The NCCL Inspector is a plugin for the NVIDIA Collective Communications Library (NCCL) that provides detailed, per-communicator, per-collective performance and metadata logging. It is designed to help users analyze and debug NCCL collective operations by generating structured JSON output for each operation.

  • Performance Exporter - Tool for analyzing and visualizing NCCL performance data from inspector logs

Folder Location

The Inspector plugin source is located in:

ext-profiler/inspector/

Building the Inspector Plugin

To build the Inspector plugin, run:

make

The build system will automatically detect CUDA and NCCL installations from your environment. If you need to specify custom paths, you can set CUDA_HOME and NCCL_HOME environment variables or pass them as make arguments.

Build Options

The Makefile supports several build options:

  • DEBUG=1: Enable debug build with additional debugging information
  • ASAN=1: Enable Address Sanitizer for memory error detection
  • UBSAN=1: Enable Undefined Behavior Sanitizer

Example debug build:

make DEBUG=1

Build Output

The build process creates:

  • libnccl-profiler-inspector.so: The main inspector plugin library
  • version.cc: Auto-generated version information from git

Using NCCL Inspector

Key Differences from Normal NCCL Usage

The main difference between running NCCL with the Inspector plugin versus running NCCL normally is the addition of environment variables that enable detailed performance logging:

Normal NCCL Run:

# Standard NCCL execution
./your_nccl_application

NCCL Inspector Run:

# NCCL Inspector enabled execution
export NCCL_PROFILER_PLUGIN=/path/to/nccl/ext-profiler/inspector/libnccl-profiler-inspector.so
export NCCL_INSPECTOR_ENABLE=1
export NCCL_INSPECTOR_DUMP_THREAD_INTERVAL_MICROSECONDS=500
./your_nccl_application

Required Environment Variables

  • NCCL_PROFILER_PLUGIN=/path/to/nccl/ext-profiler/inspector/libnccl-profiler-inspector.so Loads the Inspector plugin into NCCL.
  • NCCL_INSPECTOR_ENABLE=1 Enables the Inspector plugin.
  • NCCL_INSPECTOR_DUMP_THREAD_INTERVAL_MICROSECONDS=<interval> Sets the interval (in microseconds) for the internal dump thread to write output. Example: 500.
  • NCCL_INSPECTOR_DUMP_DIR=<output_dir> (optional) Sets the output directory for logs. If not set, defaults to nccl-inspector-unknown-jobid or nccl-inspector-<slurm_job_id> if running under SLURM.
  • NCCL_INSPECTOR_DUMP_VERBOSE=<0|1> (optional) Enables verbose output including event trace information. Set to 1 to enable, 0 to disable (default).

Example Usage

Single Node:

export NCCL_PROFILER_PLUGIN=/path/to/nccl/ext-profiler/inspector/libnccl-profiler-inspector.so
export NCCL_INSPECTOR_ENABLE=1
export NCCL_INSPECTOR_DUMP_THREAD_INTERVAL_MICROSECONDS=500
./build/test/perf/all_reduce_perf -b 8 -e 16G -f 2 -g 8

Multi-Node (SLURM):

# Add these environment variables to your SLURM script
export NCCL_PROFILER_PLUGIN=/path/to/nccl/ext-profiler/inspector/libnccl-profiler-inspector.so
export NCCL_INSPECTOR_ENABLE=1
export NCCL_INSPECTOR_DUMP_THREAD_INTERVAL_MICROSECONDS=500
export NCCL_INSPECTOR_DUMP_DIR=/path/to/logs/${SLURM_JOB_ID}/

# Then run your normal NCCL application
srun your_nccl_application

Example Scripts

For detailed example scripts showing how to integrate NCCL Inspector with different workloads, see the test/examples/ directory:

  • Single Node Example: Basic NCCL performance testing with inspector
  • Multi-Node SLURM Example: Comprehensive multi-node testing with various collective operations
  • Training Workload Example: Integration with distributed training workloads

Output Example

Each output file contains JSON objects with the following structure:

{
  "header": {
    "id": "0x7f8c496ae9f661",
    "rank": 2,
    "n_ranks": 8,
    "nnodes": 1
  },
  "metadata": {
    "inspector_output_format_version": "v4.0",
    "git_rev": "",
    "rec_mechanism": "profiler_plugin",
    "dump_timestamp_us": 1748030377748202,
    "hostname": "example-hostname",
    "pid": 1639453
  },
  "coll_perf": {
    "coll": "AllReduce",
    "coll_sn": 1407,
    "coll_msg_size_bytes": 17179869184,
    "coll_exec_time_us": 61974,
    "coll_algobw_gbs": 277.210914,
    "coll_busbw_gbs": 485.119099
  }
}

Output Example Verbose

To enable verbose output with event trace information, set the NCCL_INSPECTOR_DUMP_VERBOSE=1 environment variable:

export NCCL_INSPECTOR_DUMP_VERBOSE=1

This will include additional event trace information in the JSON output, showing the sequence of callbacks and timestamps for each individual event.

{
  "header": {
    "id": "0xe62dedaa97644a",
    "rank": 4,
    "n_ranks": 8,
    "nnodes": 1
  },
  "metadata": {
    "inspector_output_format_version": "v4.0",
    "git_rev": "9019a1912-dirty",
    "rec_mechanism": "nccl_profiler_interface",
    "dump_timestamp_us": 1752867229276385,
    "hostname": "example-hostname",
    "pid": 438776
  },
  "coll_perf": {
    "coll": "ReduceScatter",
    "coll_sn": 1231,
    "coll_msg_size_bytes": 2147483648,
    "coll_exec_time_us": 41057,
    "coll_timing_source": "kernel_gpu",
    "coll_algobw_gbs": 418.439467,
    "coll_busbw_gbs": 366.134533,
    "event_trace_sn": {
      "coll_start_sn": 1,
      "coll_stop_sn": 2,
      "kernel_events": [
        {
          "channel_id": 0,
          "kernel_start_sn": 3,
          "kernel_stop_sn": 48,
          "kernel_record_sn": 47
        }
      ]
    },
    "event_trace_ts": {
      "coll_start_ts": 1752867229235059,
      "coll_stop_ts": 1752867229235064,
      "kernel_events": [
        {
          "channel_id": 0,
          "kernel_start_ts": 1752867229235181,
          "kernel_stop_ts": 1752867229275811,
          "kernel_record_ts": 1752867229275811
        }
      ]
    }
  }
}

Multiple such JSON objects are written, one per collective operation per communicator.

Output Directory

  • By default, output files are written to:
    • nccl-inspector-unknown-jobid (if no SLURM job ID is present)
    • nccl-inspector-<slurm_job_id> (if running under SLURM)
  • You can override this with the NCCL_INSPECTOR_DUMP_DIR environment variable.

Additional Notes

  • The plugin is compatible with standard NCCL workflows and can be used in both single-node and multi-node (SLURM) environments.
  • For more details, see the source code and comments in ext-profiler/inspector/.