5b95d227bc
The bug was reproduced like this. In terminal #1, run command: sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow In terminal #2, inject errors: while true; do sudo amdgpuras -b 7 -s 1 -m 6 -t 2; sleep 2; done The terminal #1 starts dumping cper entry information that it captures. After 20 entries have been captured, open terminal #3 and run same command as terminal #1: sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow From terminal #3, there will be no output, even when terminal #1 continues capturing and printing information. The fix: Since we already have more than 20 CPER entries available in the GPU buffer, when we run the command from terminal #3 to start capturing from the beginning and pass 20 buffers to copy entries to, the C++ API returns a code saying there is more data available. The Python CLI should not treat this as an error, but should continue to print what the API returned. --------- Signed-off-by: Oosman Saeed <oossaeed@amd.com>