On large BAR systems, for small-sized code-objects, we get performance using direct memcpy due to latencies when doing the blit-copy. [ROCm/ROCR-Runtime commit: da2607024b]
da2607024b