The test measures the bandwidth between GPUs. Currently we do not care numa topology as some products really support across PCI-e root complex p2p. test result on two gfx900 system. [ RUN ] KFDPerformanceTest.P2PBandWidthTest [ ] Copy from node to node by [push, NONE] [ ] [1 -> 0] 6.13477 - 6.12695 GB/s [ ] [1 -> 2] 3.77734 - 3.76855 GB/s [ ] [2 -> 0] 6.67676 - 6.6543 GB/s [ ] [2 -> 1] 6.14453 - 6.12793 GB/s [ ] Copy from node to node by [pull, NONE] [ ] [1 -> 0] 6.10547 - 6.08105 GB/s [ ] [1 -> 2] 9.65527 - 9.65039 GB/s [ ] [2 -> 0] 6.49805 - 6.4873 GB/s [ ] [2 -> 1] 8.95508 - 8.85254 GB/s [ ] Full duplex copy from node to node by [push|pull, NONE] [ ] [1 -> 0] 11.0986 - 11.0986 GB/s [ ] [1 -> 2] 7.54297 - 7.54297 GB/s [ ] [2 -> 0] 12.0264 - 11.9639 GB/s [ ] [2 -> 1] 12.0469 - 12.0371 GB/s [ ] Full duplex copy from node to node by [push, push] [ ] [1 <-> 2] 11.7324 - 11.4541 GB/s [ ] Full duplex copy from node to node by [pull, pull] [ ] [1 <-> 2] 11.4824 - 11.0508 GB/s [ ] Copy from node to multiple nodes by [push, NONE] [ ] [1 -> [0...2]] 5.625 - 5.73633 GB/s [ ] [2 -> [0...2]] 6.45801 - 6.4707 GB/s [ ] Copy from multiple nodes to node by [push, NONE] [ ] [[1...2] -> 0] 12.8379 - 12.2578 GB/s Now we can get more timestamp info like below. Copy from node to node by [push, NONE] [1 -> 0] [1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-############################### [1 : 1] #################################################################################################### [1 -> 2] [1 : 0] #--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-###################################### [1 : 1] ##################################################################################################-# [2 -> 0] [2 : 0] ##-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-################# [2 : 1] ###############################################################################-#############-###-## [2 -> 1] [2 : 0] ##-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-#################### [2 : 1] ################################################################################-###-############-## [snip] Full duplex copy from node to node by [push, push] [1 <-> 2] [1 : 0] #-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#################################### [1 : 1] ################-###################################################-############-####-############# [2 : 2] #-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-################## [2 : 3] #####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-#####-## Full duplex copy from node to node by [pull, pull] [1 <-> 2] [1 : 0] ######################################################################-##-#-###############-####-### [1 : 1] #-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-############################ [2 : 2] ##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-############ [2 : 3] #-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#########-############# Copy from node to multiple nodes by [push, NONE] [1 -> [0...2]] [1 : 0] #-#--#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-############################### [1 : 1] ########################################################################################-###-###-### [2 -> [0...2]] [2 : 0] ##-##-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-################## [2 : 1] -################################################################################################-## Copy from multiple nodes to node by [push, NONE] [[1...2] -> 0] [1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-############################### [1 : 1] ################################################################################################-#-# [2 : 2] ##-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-################## [2 : 3] #########################-#########################-#########################-######################### [ OK ] KFDPerformanceTest.P2PBandWidthTest (15982 ms) Change-Id: Ia90044191d51650ccb220476d31fb317aa3ad6ce Signed-off-by: xinhui pan <xinhui.pan@amd.com>
ROCt Library
This repository includes the user-mode API interfaces used to interact with the ROCk driver. Currently supported agents include only the AMD/ATI Fiji family of discrete GPUs.
Starting at 1.7 release, ROCt uses drm render device. This requires the user to belong to video group. Add the user account to video group with "sudo usermod -a -G video username" command if the user if not part of video group yet.
ROCk Driver
The ROCt library is not a standalone product and requires that you have the correct ROCk driver set installed. We recommend reading the full compatibility and installation details which are available in the ROCk github:
https://github.com/RadeonOpenCompute/ROCK-Radeon-Open-Compute-Kernel-Driver
Building the Thunk
A simple cmake-based system is available for building thunk. To build the thunk from the the ROCT-Thunk-Interface directory, execute:
mkdir -p build
cd build
cmake ..
make
If the hsakmt-roct and hsakmt-roct-dev packages are desired:
mkdir -p build
cd build
cmake ..
make package
make package-dev
Disclaimer
The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD's products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale.
AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
Copyright (c) 2014-2017 Advanced Micro Devices, Inc. All rights reserved.