Add Feature - Add NPKit Support in RCCL (#564)
* apply npkit * fix bug * add npkit in readme
Cette révision appartient à :
@@ -83,6 +83,18 @@ will run only AllReduce correctness tests with float32 datatype. See "Running a
|
||||
There are also other performance and error-checking tests for RCCL. These are maintained separately at https://github.com/ROCmSoftwarePlatform/rccl-tests.
|
||||
See the rccl-tests README for more information on how to build and run those tests.
|
||||
|
||||
## NPKit
|
||||
|
||||
RCCL integrates [NPKit](https://github.com/microsoft/npkit), a profiler framework that enables collecting fine-grained trace events in RCCL components, especially in giant collective GPU kernels.
|
||||
|
||||
Please check [NPKit sample workflow for RCCL](https://github.com/microsoft/NPKit/tree/main/rccl_samples) as a fully automated usage example. It also provides good templates for the following manual instructions.
|
||||
|
||||
To manually build RCCL with NPKit enabled, pass `-DNPKIT_FLAGS="-DENABLE_NPKIT -DENABLE_NPKIT_...(other NPKit compile-time switches)"` with cmake command. All NPKit compile-time switches are declared in the RCCL code base as macros with prefix `ENABLE_NPKIT_`, and they control which information will be collected. Also note that currently NPKit only supports collecting non-overlapped events on GPU, and `-DNPKIT_FLAGS` should follow this rule.
|
||||
|
||||
To manually run RCCL with NPKit enabled, environment variable `NPKIT_DUMP_DIR` needs to be set as the NPKit event dump directory. Also note that currently NPKit only supports 1 GPU per process.
|
||||
|
||||
To manually analyze NPKit dump results, please leverage [npkit_trace_generator.py](https://github.com/microsoft/NPKit/blob/main/rccl_samples/npkit_trace_generator.py).
|
||||
|
||||
## Library and API Documentation
|
||||
|
||||
Please refer to the [Library documentation](https://rccl.readthedocs.io/) for current documentation.
|
||||
|
||||
Référencer dans un nouveau ticket
Bloquer un utilisateur