German Andryeyev 008133cf41 SWDEV-305016 - Improve MGPU scaling in Tensorflow
Add a threshold for ROCR/SDMA P2P transfers. ROCR copy path
requires extra barriers in compute for synchronization. That costs
extra performance with tiny transfers.
Reduce active wait time to 10us. Tensorflow uses extra thread
per GPU with constant hipEventQuery() calls. Longer active waits
in ROCr affect CPU performance.

Change-Id: I9020358438615fa2d4617f862f00a562f0a588e7
2021-12-08 11:59:37 -05:00
S
Описание
No description provided
282 MiB
Languages
C++ 67.5%
C 20.6%
Python 6.6%
CMake 3.4%
Shell 0.6%
Разное 1.1%