Support hipLaunchCooperativeKernelMultiDevice()
- Add validation logic for MGPU launches to pass a cuda test
Change-Id: Iccca7fde43493fc3bc6685512d39202271ae3e92
[ROCm/hip commit: 5fe91ccb1b]
Support hipLaunchCooperativeKernelMultiDevice()
- Add hipCooperativeLaunchMultiDeviceNoPreSync and
hipCooperativeLaunchMultiDeviceNoPostSync support to pass a cuda test
Change-Id: If518f11ef2636a2235e5df9e77f879d8ced68102
[ROCm/hip commit: da1444bfc8]
reinterpret_cast<> doesn't create an object, so the texref is actually unitiliazed. This may lead to garbage data in some of its struct members.
Initialize it by performing a placement new. The constructer should set all of its members to default values. There's no way currently to extract the channel type, so use single channel char for now.
Change-Id: I41b305a75bb3f30130324de785099f55b3e130c7
[ROCm/hip commit: 292d008a64]
And also don't optimize the case where start==stop event to compute
elapsed time since the command can be a NDRange one.
HIP directed test will need to be fixed for that.
Change-Id: I64fadd6ab8ab1a490e7a2b7165a591df5a5cf3a2
[ROCm/hip commit: 9692ac6b5f]