da2607024b
On large BAR systems, for small-sized code-objects, we get performance using direct memcpy due to latencies when doing the blit-copy.