be743bcd59
- Do not use extra barrier to detect graph end. If its a kernel node we
can use a completion signal for the last packet. Saves roughly 6us for
Phantom testcase per graph launch.
Change-Id: I5e0c2479d9964fbeda86ed97533f6718f49a7f91
[ROCm/clr commit: c3bd229f4f]