45fa752888
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API
* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5
* modified code based on the requested changes
* changes to lock all streams before launching kernels for each device and unlock them after the dispatches
* check each stream to be valid before starting to lock all the streams
[ROCm/hip commit: 96dc74897d]