Hyper-V 2012 Server with NUMA enabled crashes UEB

SUMMARY

Describe the problem and the resolution which causes UEB kernel panics or kernel hang messages on Hyper-V 2012 servers with NUMA spanning enabled. This problem only occurs with Hyper-V server 2012 / 2012R2 and CentOS6/RHEL6 VMs. The CentOS5/RHEL5 VMs do not recognize NUMA so do not see this problem. Prior versions of Hyper-V server with CentOS6/RHEL6 do not experience the problem either.

ISSUE

Issue

UEB kernel panics or kernel hang messages on Hyper-V 2012 servers with NUMA spanning enabled. 

This problem only occurs with Hyper-V server 2012 / 2012R2 and CentOS6/RHEL6 VMs.   The CentOS5/RHEL5 VMs do not recognize NUMA so do not see this problem.  Prior versions of Hyper-V server with CentOS6/RHEL6 do not experience the problem either.

Problem

Adding more than 6GB of RAM causes a kernel panic with the text "PANIC: early exception 06 rip 10:fffff…   error 0 cr2  0".   Usually the UEB is CentOS6 and the server is Hyper-V 2012 or 2012R2 with NUMA spanning enabled.
Alternatively, with 6GB of RAM or less, this kernel hang message may appear, which will cause the system to hang during shutdown.
INFO: task flush-253:1:39591 blocked for more than 120 seconds.
Not tainted 2.6.32-504.1.3.el6_bp.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-253:1 D 0000000000000000 0 39591 2 0x00000080
ffff880037d3b9d0 0000000000000046 0000000000000000 ffff88017f857100
ffff880037d3b990 ffffffffa000461c 00003572e88abc43 ffffffff8113ae0c
0000000200b73b70 000000010377f21b ffff8800bd4a8638 ffff880037d3bfd8
Call Trace:
[<ffffffffa000461c>]
? dm_table_unplug_all+0x5c/0x100 [dm_mod]
 
Resolution

The root cause is a bug in Hyper-V 2012 NUMA logic which occurs with any CentOS6 or RHEL6 Linux VM.  The workaround is to disable NUMA entirely in the Linux VM.  This workaround is included in Hyper-V UEB 9.0.0 and later.
For the workaround, perform the following steps:
  1. In the UEB Linux VM, edit /boot/grub/grub.conf to add numa=off at the end of any kernel lines after the other kernel parameters.  Example below.
Before:
kernel /vmlinuz-2.6.32-504.1.3.el6_bp.x86_64 ro root=UUID=56ec91c2-6524-45de-8db4-b15d90d992ae rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_root/lv_swap  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM quiet loglevel=1 nmi_watchdog=0 rd_NO_PLYMOUTH max_loop=128 elevator=noop
After:
kernel /vmlinuz-2.6.32-504.1.3.el6_bp.x86_64 ro root=UUID=56ec91c2-6524-45de-8db4-b15d90d992ae rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_root/lv_swap  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM quiet loglevel=1 nmi_watchdog=0 rd_NO_PLYMOUTH max_loop=128 elevator=noop numa=off
 
  1. [Optional] Disable NUMA in "Hyper-V Settings", then restart the Hyper-V service
  2. [Optional] In the UEB VM Settings, "Memory" settings, check the "Enable Dynamic Memory" option
     

References

Related social.technet Article about CentOS6  panics on Hyper-V 2012:
https://social.technet.microsoft.com/Forums/windowsserver/en-US/4e7b89d8-62a4-4dcd-9181-c0d186c6060b/centos-63-on-windows-server-2012-hyperv-30-bug-panic-early-exception?forum=winserverhyperv

Microsoft TechNet:  CentOS and RHEL virtual machines on Hyper-V:
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/Supported-CentOS-and-Red-Hat-Enterprise-Linux-virtual-machines-on-Hyper-V

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Contact us