#Configfile Format: #================== #A Link is defined as a uni-directional transfer from src memory location to dst memory location #Each single line in the configuration file defines a set of Links to run in parallel #There are two ways to specify the configuration file: #1) Basic # The basic specification assumes the same number of threadblocks/CUs used per link # A positive number of Links is specified followed by that number of triplets describing each Link #Links #CUs (GPUIndex1 srcMem1 dstMem1) ... (GPUIndexL srcMemL dstMemL) #2) Advanced # The advanced specification allows different number of threadblocks/CUs used per Link # A negative number of links is specified, followed by quadruples describing each Link # -#Links (GPUIndex1 #CUs1 srcMem1 dstMem1) ... (GPUIndexL #CUsL srcMemL dstMemL) #Argument Details: # #Links : Number of Links to be run in parallel # #CUs : Number of threadblocks/CUs to use for a Link # GpuIndex: 0-indexed GPU id executing the Link # srcMemL : Source memory location (Where the data is to be read from). Ignored in memset mode # dstMemL : Destination memory location (Where the data is to be written to) # Memory locations are specified by a character indicating memory type, followed by GPU device index (0-indexed) # Supported memory locations are: # - P: Pinned host memory (on CPU, on NUMA node closest to provided GPU index) # - G: Global device memory (on GPU) #Round brackets may be included for human clarity, but will be ignored #Examples: #1 4 (0 G0 G1) Single Link that uses 4 CUs on GPU 0 that reads memory from GPU 0 and copies it to memory on GPU 1 #1 4 (0 G1 G0) Single Link that uses 4 CUs on GPU 0 that reads memory from GPU 1 and copies it to memory on GPU 0 #1 4 (2 P0 G2) Single Link that uses 4 CUs on GPU 2 that reads memory from CPU 0 and copies it to memory on GPU 2 #2 4 (0 G0 G1) (1 G1 G0) Runs 2 Links in parallel. GPU 0 - > GPU1, and GP1 -> GPU 0, each with 4 CUs #-2 (0 G0 G1 4) (1 G1 G0 2) Runs 2 Links in parallel. GPU 0 - > GPU 1 using four CUs, and GPU1 -> GPU 0 using two CUs # Single link between GPUs 0 and 1 1 1 (0 G0 G1)