rocm-systems/tools/TransferBench/example.cfg

#Configfile Format:
#==================
#A Link is defined as a uni-directional transfer from src memory location to dst memory location
#Each single line in the configuration file defines a set of Links to run in parallel

#There are two ways to specify the configuration file:

#1) Basic
#   The basic specification assumes the same number of threadblocks/CUs used per link
#   A positive number of Links is specified followed by that number of triplets describing each Link

   #Links #CUs (GPUIndex1 srcMem1 dstMem1) ... (GPUIndexL srcMemL dstMemL)

#2) Advanced
#   The advanced specification allows different number of threadblocks/CUs used per Link
#   A negative number of links is specified, followed by quadruples describing each Link
#   -#Links (GPUIndex1 #CUs1 srcMem1 dstMem1) ... (GPUIndexL #CUsL srcMemL dstMemL)

#Argument Details:
#  #Links  :   Number of Links to be run in parallel
#  #CUs    :   Number of threadblocks/CUs to use for a Link
#  GpuIndex:   0-indexed GPU id executing the Link
#  srcMemL :   Source memory location (Where the data is to be read from). Ignored in memset mode
#  dstMemL :   Destination memory location (Where the data is to be written to)
#              Memory locations are specified by a character indicating memory type, followed by GPU device index (0-indexed)
#              Supported memory locations are:
#              - P:    Pinned host memory   (on CPU, on NUMA node closest to provided GPU index)
#              - G:    Global device memory (on GPU)
#Round brackets may be included for human clarity, but will be ignored

#Examples:
#1 4 (0 G0 G1)              Single Link that uses 4 CUs on GPU 0 that reads memory from GPU 0 and copies it to memory on GPU 1
#1 4 (0 G1 G0)              Single Link that uses 4 CUs on GPU 0 that reads memory from GPU 1 and copies it to memory on GPU 0
#1 4 (2 P0 G2)              Single Link that uses 4 CUs on GPU 2 that reads memory from CPU 0 and copies it to memory on GPU 2
#2 4 (0 G0 G1) (1 G1 G0)    Runs 2 Links in parallel.  GPU 0 - > GPU1, and GP1 -> GPU 0, each with 4 CUs
#-2 (0 G0 G1 4) (1 G1 G0 2) Runs 2 Links in parallel.  GPU 0 - > GPU 1 using four CUs, and GPU1 -> GPU 0 using two CUs

# Single link between GPUs 0 and 1
1 1 (0 G0 G1)