8a575d8d6e
Using a thread_local object is problematic as the thread local destructors are called first before any global destructor, making the object invalid while tearing down the process. rocblas uses a global destructor to clean up the loaded HIP modules and ends up calling hip_executable_destroy after the timestamp stack is destructed. As a result the begin timestamp for that API function is 0. The solution is to store the phase_enter timestamp in the phase_data. Change-Id: If143f4d123dfb111c72fb20365431d07e73fc570