1883f736ad
Problem: When TheRock-based PyTorch package is installed along with amdsmi, importing torch causes a double-free crash on exit (GitHub issue ROCm/TheRock#2269). Root cause: Both librocm_smi64.so and libamd_smi.so export the C++ static member 'amd::smi::Device::devInfoTypesStrings'. When libraries are loaded with RTLD_GLOBAL, the dynamic linker resolves libamd_smi.so's reference to this symbol to the one in librocm_smi64.so. This causes: 1. librocm_smi64.so registers its destructor for devInfoTypesStrings 2. libamd_smi.so also registers a destructor, but for the SAME address 3. On exit, both destructors run on the same object -> double-free Fix: Change devInfoTypesStrings from a class static member to a file-local static variable. This ensures the symbol has internal linkage and is not exported, preventing the symbol collision. Changes: - rocm_smi_device.h: Remove static member declaration - rocm_smi_device.cc: Change from 'Device::devInfoTypesStrings' to file-local 'static const std::map<...> devInfoTypesStrings' - rocm_smi.cc: Remove the global alias to the (now removed) class member Tested on gfx1151. `import torch` crashed on exit before the fix, and doesn't crash after the fix.