008133cf41583d3454f2c696992b5ea3e1b8c171
Add a threshold for ROCR/SDMA P2P transfers. ROCR copy path requires extra barriers in compute for synchronization. That costs extra performance with tiny transfers. Reduce active wait time to 10us. Tensorflow uses extra thread per GPU with constant hipEventQuery() calls. Longer active waits in ROCr affect CPU performance. Change-Id: I9020358438615fa2d4617f862f00a562f0a588e7
Описание
No description provided
Languages
C++
67.5%
C
20.6%
Python
6.6%
CMake
3.4%
Shell
0.6%
Разное
1.1%