Support hipLaunchCooperativeKernelMultiDevice()
- Add validation logic for MGPU launches to pass a cuda test
Change-Id: Iccca7fde43493fc3bc6685512d39202271ae3e92
Support hipLaunchCooperativeKernelMultiDevice()
- Add hipCooperativeLaunchMultiDeviceNoPreSync and
hipCooperativeLaunchMultiDeviceNoPostSync support to pass a cuda test
Change-Id: If518f11ef2636a2235e5df9e77f879d8ced68102
reinterpret_cast<> doesn't create an object, so the texref is actually unitiliazed. This may lead to garbage data in some of its struct members.
Initialize it by performing a placement new. The constructer should set all of its members to default values. There's no way currently to extract the channel type, so use single channel char for now.
Change-Id: I41b305a75bb3f30130324de785099f55b3e130c7
And also don't optimize the case where start==stop event to compute
elapsed time since the command can be a NDRange one.
HIP directed test will need to be fixed for that.
Change-Id: I64fadd6ab8ab1a490e7a2b7165a591df5a5cf3a2