Virtual machine reports a "BUG: soft lockup"
Issue
- Virtual machine guest suffers multiple soft lockups at the same time
- We are experiencing kernel panic due to softlockup.
Logs show messages like (examples from different sources):
Cause: kernel messages informing that vCPU did not get execution for N seconds
Resolution
Under normal circumstances, these messages may go away if the load decreases.
This 'soft lockup' can happen if the kernel is busy, working on a massive amount of objects which need to be scanned, freed, or allocated, respectively.
The stack traces of those tasks can give a first idea about what the tasks were done. However, to be able to examine the cause behind the messages, a kernel dump would be needed.
While these messages cannot be disabled entirely, in some situations, increasing the time before these To do so, increase the following sysctl parameter: kernel.watchdog_thresh
This 'soft lockup' can happen if the kernel is busy, working on a massive amount of objects which need to be scanned, freed, or allocated, respectively.
The stack traces of those tasks can give a first idea about what the tasks were done. However, to be able to examine the cause behind the messages, a kernel dump would be needed.
While these messages cannot be disabled entirely, in some situations, increasing the time before these
soft lockups are fired can relax the situation.
The default value for this parameter is 10 and double the matter might be a good start.
e.g.
server1:~ # echo 20 > /proc/sys/kernel/watchdog_thresh
or
server1:~ # echo "kernel.watchdog_thresh=20" > /etc/sysctl.d/99-watchdog_thresh.conf server1:~ # sysctl -p /etc/sysctl.d/99-watchdog_thresh.conf
For more information on how to configure and capture kernel dump please check: Configure crashkernel memory for kernel core dump analysis
Cause
A 'soft lockup' that causes the kernel to loop in kernel mode for more than 20 seconds without giving other tasks a chance to run.
The watchdog daemon will send a non-maskable interrupt (NMI) to all CPUs in the system who, in turn, print the stack traces of their currently running tasks.
The watchdog daemon will send a non-maskable interrupt (NMI) to all CPUs in the system who, in turn, print the stack traces of their currently running tasks.
BUG: soft lockup - CPU#6 stuck for 73s! [flush-253:0:1207]
BUG: soft lockup - CPU#7 stuck for 74s! [processname:15706]
BUG: soft lockup - CPU#5 stuck for 63s! [processname:25582]
BUG: soft lockup - CPU#0 stuck for 64s! [proceessname:15789]
Comments
Post a Comment