diff --git a/README.md b/README.md index f0f6347c39..a3a4336870 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ These tests check both the performance and the correctness of RCCL operations. T ## Build -To build the tests, just type `make`. +To build the tests, just type `make` or `make -j` If HIP is not installed in `/opt/rocm`, you may specify `HIP_HOME`. Similarly, if RCCL (`librccl.so`) is not installed in `/opt/rocm/lib/`, you may specify `NCCL_HOME` and `CUSTOM_RCCL_LIB`. @@ -75,12 +75,14 @@ RCCL Tests can run on multiple processes, multiple threads, and multiple HIP dev ### Quick examples Run on single node with 8 GPUs (`-g 8`), scanning from 8 Bytes to 128MBytes : + ```shell $ ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 8 ``` Run 64 MPI processes on nodes with 8 GPUs each, for a total of 64 GPUs spread across 8 nodes : (NB: The rccl-tests binaries must be compiled with `MPI=1` for this case) + ```shell $ mpirun -np 64 -N 8 ./build/all_reduce_perf -b 8 -e 8G -f 2 -g 1 ``` @@ -138,8 +140,8 @@ All tests support the same set of arguments : * `-z,--blocking <0/1>` Make RCCL collective blocking, i.e. have CPUs wait and sync after each collective. Default : 0. * `-G,--hipgraph ` Capture iterations as a HIP graph and then replay specified number of times. Default : 0. * `-C,--report_cputime <0/1>]` Report CPU time instead of latency. Default : 0. - * `-R,--local_register <1/0>` enable local buffer registration on send/recv buffers. Default : 0. - * `-T,--timeout