Virtual machine reports a "BUG: soft lockup"

 

Issue

  • Virtual machine guest suffers multiple soft lockups at the same time
  • We are experiencing kernel panic due to softlockup.
  • Logs show messages like (examples from different sources):

Cause: kernel messages informing that vCPU did not get execution for N seconds

Resolution

Under normal circumstances, these messages may go away if the load decreases.
This 'soft lockup' can happen if the kernel is busy, working on a massive amount of objects which need to be scanned, freed, or allocated, respectively.
The stack traces of those tasks can give a first idea about what the tasks were done. However, to be able to examine the cause behind the messages, a kernel dump would be needed.

While 
these messages cannot be disabled entirely, in some situations, increasing the time before these
soft lockups are fired can relax the situation.
 
To do so, increase the following sysctl parameter: kernel.watchdog_thresh
 
The default value for this parameter is 10 and double the matter might be a good start.
 
e.g.
 
server1:~ # echo 20 > /proc/sys/kernel/watchdog_thresh
or


server1:~ # echo "kernel.watchdog_thresh=20" > /etc/sysctl.d/99-watchdog_thresh.conf
server1:~ # sysctl -p  /etc/sysctl.d/99-watchdog_thresh.conf
For more information on how to configure and capture kernel dump please check: Configure crashkernel memory for kernel core dump analysis

Cause

A 'soft lockup'  that causes the kernel to loop in kernel mode for more than 20 seconds without giving other tasks a chance to run.
The watchdog daemon will send a non-maskable interrupt (NMI) to all CPUs in the system who, in turn, print the stack traces of their currently running tasks.

 
BUG: soft lockup - CPU#6 stuck for 73s! [flush-253:0:1207]
BUG: soft lockup - CPU#7 stuck for 74s! [processname:15706]
BUG: soft lockup - CPU#5 stuck for 63s! [processname:25582]
BUG: soft lockup - CPU#0 stuck for 64s! [proceessname:15789]

Comments

Popular posts from this blog

How to fix Oracle SQL Developer connection issue "Got minus one from a read call"

How to troubleshoot Long Running Concurrent Request in EBS 12.2

Few Important steps of Oracle Database Clone