@@ -16,20 +16,20 @@ particular version of the GPU stack (such as NVIDIA CUDA), from the network code
|
||||
particular version of the networking stack. Using this method, you can easily integrate any CUDA version
|
||||
with any network stack version.
|
||||
|
||||
NCCL network plugins are packaged as a shared library called ``libnccl-net.so``. The shared library
|
||||
NCCL network plugins are packaged as a shared library called ``librccl-net.so``. The shared library
|
||||
contains one or more implementations of the NCCL Net API in the form of versioned structs,
|
||||
which are filled with pointers to all required functions.
|
||||
|
||||
Plugin architecture
|
||||
===================
|
||||
|
||||
When NCCL is initialized, it searches for a ``libnccl-net.so`` library and dynamically loads it,
|
||||
When NCCL is initialized, it searches for a ``librccl-net.so`` library and dynamically loads it,
|
||||
then searches for symbols inside the library.
|
||||
|
||||
The ``NCCL_NET_PLUGIN`` environment variable allows multiple plugins to coexist. If it's set, NCCL
|
||||
looks for a library named ``libnccl-net-${NCCL_NET_PLUGIN}.so``. It is therefore
|
||||
recommended that you name the library according to that pattern, with a symlink pointing from ``libnccl-net.so``
|
||||
to ``libnccl-net-${NCCL_NET_PLUGIN}.so``. This lets users select the correct plugin
|
||||
looks for a library named ``librccl-net-${NCCL_NET_PLUGIN}.so``. It is therefore
|
||||
recommended that you name the library according to that pattern, with a symlink pointing from ``librccl-net.so``
|
||||
to ``librccl-net-${NCCL_NET_PLUGIN}.so``. This lets users select the correct plugin
|
||||
if there are multiple plugins in the path.
|
||||
|
||||
Struct versioning
|
||||
@@ -169,7 +169,7 @@ Initialization
|
||||
|
||||
Setting ``NCCL_NET=<plugin name>`` ensures a specific network implementation is used, with
|
||||
a matching ``name``. This is not to be confused with ``NCCL_NET_PLUGIN`` which defines a suffix for the
|
||||
``libnccl-net.so`` library name to load.
|
||||
``librccl-net.so`` library name to load.
|
||||
|
||||
* ``init`` - As soon as NCCL finds the plugin and the correct ``ncclNet`` symbol, it calls the ``init`` function. This allows the plugin to discover network devices and ensure they are usable.
|
||||
If the ``init`` function does not return ``ncclSuccess``, then NCCL does not use the plugin and falls back to internal ones.
|
||||
|
||||
@@ -22,7 +22,7 @@ enum ncclPluginType {
|
||||
#define NUM_LIBS 3
|
||||
static void *libHandles[NUM_LIBS];
|
||||
static const char *pluginNames[NUM_LIBS] = { "NET", "TUNER", "PROFILER" };
|
||||
static const char *pluginPrefix[NUM_LIBS] = { "libnccl-net", "libnccl-tuner", "libnccl-profiler" };
|
||||
static const char *pluginPrefix[NUM_LIBS] = { "librccl-net", "librccl-tuner", "librccl-profiler" };
|
||||
static const char *pluginFallback[NUM_LIBS] = { "Using internal net plugin.", "Using internal tuner plugin.", "" };
|
||||
static unsigned long subsys[NUM_LIBS] = { NCCL_INIT|NCCL_NET, NCCL_INIT|NCCL_TUNING, NCCL_INIT };
|
||||
|
||||
|
||||
새 이슈에서 참조
사용자 차단