Remove global program lock in order to fix too long kernel launch overhead with multi-threads on MGPUs. This patch depends on a compiler patch that makes LC thread safe. Change-Id: Ic8a7374d19112764d6de5d483ec5d07a56661d1b