Disable MSCCL for the non-multi-process case by default (#1307)
* Added `RCCL_MSCCL_ENABLE_SINGLE_PROCESS` runtime flag to return to the original MSCCL enablement behaviour except when explicitly enabling for multi-thread. * Added documentation for the new `RCCL_MSCCL_ENABLE_SINGLE_PROCESS` runtime env var.
This commit is contained in:
committed by
GitHub
orang tua
1a48e19b18
melakukan
e056fe8f7e
+2
-2
@@ -128,9 +128,9 @@ To manually run RCCL with NPKit enabled, environment variable `NPKIT_DUMP_DIR` n
|
||||
To manually analyze NPKit dump results, please leverage [npkit_trace_generator.py](https://github.com/microsoft/NPKit/blob/main/rccl_samples/npkit_trace_generator.py).
|
||||
|
||||
## MSCCL/MSCCL++
|
||||
RCCL integrates MSCCL(https://github.com/microsoft/msccl) and MSCCL++ (https://github.com/microsoft/mscclpp) to leverage the highly efficient GPU-GPU communication primitives for collective operations. Thanks to Microsoft Corporation for collaborating with us in this project.
|
||||
RCCL integrates [MSCCL](https://github.com/microsoft/msccl) and [MSCCL++](https://github.com/microsoft/mscclpp) to leverage the highly efficient GPU-GPU communication primitives for collective operations. Thanks to Microsoft Corporation for collaborating with us in this project.
|
||||
|
||||
MSCCL uses XMLs for different collective algorithms on different architectures. RCCL collectives can leverage those algorithms once the corresponding XML has been provided by the user. The XML files contain the sequence of send-recv and reduction operations to be executed by the kernel. On MI300X, MSCCL is enabled by default. On other platforms, the users may have to enable this by setting `RCCL_MSCCL_FORCE_ENABLE=1`.
|
||||
MSCCL uses XMLs for different collective algorithms on different architectures. RCCL collectives can leverage those algorithms once the corresponding XML has been provided by the user. The XML files contain the sequence of send-recv and reduction operations to be executed by the kernel. On MI300X, MSCCL is enabled by default. On other platforms, the users may have to enable this by setting `RCCL_MSCCL_FORCE_ENABLE=1`. By default, MSCCL will only be used if every rank belongs to a unique process; to disable this restriction for multi-threaded or single-threaded configurations, set `RCCL_MSCCL_ENABLE_SINGLE_PROCESS=1`.
|
||||
|
||||
On the other hand, RCCL allreduce and allgather collectives can leverage the efficient MSCCL++ communication kernels for certain message sizes. MSCCL++ support is available whenever MSCCL support is available. Users need to set the RCCL environment variable `RCCL_ENABLE_MSCCLPP=1` to run RCCL workload with MSCCL++ support. It is also possible to set the message size threshold for using MSCCL++ by using the environment variable `RCCL_MSCCLPP_THRESHOLD`. Once `RCCL_MSCCLPP_THRESHOLD` (the default value is 1MB) is set, RCCL will invoke MSCCL++ kernels for all message sizes less than or equal to the specified threshold.
|
||||
|
||||
|
||||
@@ -28,6 +28,7 @@
|
||||
|
||||
RCCL_PARAM(MscclEnabled, "MSCCL_ENABLE", 1);
|
||||
RCCL_PARAM(MscclForceEnabled, "MSCCL_FORCE_ENABLE", 0);
|
||||
RCCL_PARAM(MscclEnableSingleProcess, "MSCCL_ENABLE_SINGLE_PROCESS", 0);
|
||||
static const char* mscclAlgoFilePathEnv = "MSCCL_ALGO_FILE_PATH";
|
||||
|
||||
bool mscclEnabled() {
|
||||
@@ -63,7 +64,23 @@ bool mscclAvailable(int rank) {
|
||||
}
|
||||
|
||||
static bool mscclCommCompatible(ncclComm_t comm) {
|
||||
// MSCCL is always compatible now. No need to guard against multi-thread.
|
||||
if (rcclParamMscclEnableSingleProcess()) {
|
||||
// Single process usage enabled. No need to guard against multi-thread.
|
||||
return true;
|
||||
}
|
||||
|
||||
std::map<uint64_t, std::set<uint64_t>> hostHashToPidHashes;
|
||||
for (int i = 0; i < comm->nRanks; i++) {
|
||||
uint64_t hostHash = comm->peerInfo[i].hostHash;
|
||||
uint64_t pidHash = comm->peerInfo[i].pidHash;
|
||||
if (hostHashToPidHashes.find(hostHash) != hostHashToPidHashes.end()) {
|
||||
auto& pidHashSet = hostHashToPidHashes[hostHash];
|
||||
if (pidHashSet.find(pidHash) != pidHashSet.end()) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
hostHashToPidHashes[hostHash].insert(pidHash);
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user