used reinterpret_cast<uint32_t*> for numBlocks, as expected by hipOccupancyMaxActiveBlocksPerMultiprocessor() api.
Simple test for hipLaunchCooperativeKernelMultiDevice API.