Add Feature - Add NPKit Support in RCCL (#564)

* apply npkit

* fix bug

* add npkit in readme
Cette révision appartient à :
Ziyue Yang
2022-06-21 05:30:19 +08:00
révisé par GitHub
Parent f274c865c1
révision 6e93fafdc3
14 fichiers modifiés avec 1236 ajouts et 8 suppressions
+12
Voir le fichier
@@ -83,6 +83,18 @@ will run only AllReduce correctness tests with float32 datatype. See "Running a
There are also other performance and error-checking tests for RCCL. These are maintained separately at https://github.com/ROCmSoftwarePlatform/rccl-tests.
See the rccl-tests README for more information on how to build and run those tests.
## NPKit
RCCL integrates [NPKit](https://github.com/microsoft/npkit), a profiler framework that enables collecting fine-grained trace events in RCCL components, especially in giant collective GPU kernels.
Please check [NPKit sample workflow for RCCL](https://github.com/microsoft/NPKit/tree/main/rccl_samples) as a fully automated usage example. It also provides good templates for the following manual instructions.
To manually build RCCL with NPKit enabled, pass `-DNPKIT_FLAGS="-DENABLE_NPKIT -DENABLE_NPKIT_...(other NPKit compile-time switches)"` with cmake command. All NPKit compile-time switches are declared in the RCCL code base as macros with prefix `ENABLE_NPKIT_`, and they control which information will be collected. Also note that currently NPKit only supports collecting non-overlapped events on GPU, and `-DNPKIT_FLAGS` should follow this rule.
To manually run RCCL with NPKit enabled, environment variable `NPKIT_DUMP_DIR` needs to be set as the NPKit event dump directory. Also note that currently NPKit only supports 1 GPU per process.
To manually analyze NPKit dump results, please leverage [npkit_trace_generator.py](https://github.com/microsoft/NPKit/blob/main/rccl_samples/npkit_trace_generator.py).
## Library and API Documentation
Please refer to the [Library documentation](https://rccl.readthedocs.io/) for current documentation.