Appliance reboots/resets unexpectedly

SUMMARY

In certain cases, the ipmiutil watchdog timer service has been left enabled and can produce unexpected behavior such as a hard reset of the DPU.

ISSUE

Appliance reboots/resets without reason and you may see the following when checking the BMC event log.

[[email protected] ~]# ipmiutil sel -e
ipmiutil ver 2.97
isel: version 2.97
-- BMC version 1.86, IPMI version 2.0
SEL Ver 0 Support 03, Size = 512 records (Used=6, Free=506)
RecId Date/Time_______ SEV Src_ Evt_Type___ Sens# Evt_detail - Trig [Evt_data]
0001 05/05/16 00:29:31 MAJ BMC  Watchdog_2 #ca Timer interrupt 6f [c8 04 ff]
0002 05/05/16 00:29:32 CRT BMC  Watchdog_2 #ca Hard Reset action 6f [c1 04 ff]
0003 05/05/16 00:36:30 MAJ BMC  Watchdog_2 #ca Timer interrupt 6f [c8 04 ff]
0004 05/05/16 00:36:31 CRT BMC  Watchdog_2 #ca Hard Reset action 6f [c1 04 ff]
0005 05/05/16 00:44:31 MAJ BMC  Watchdog_2 #ca Timer interrupt 6f [c8 04 ff]
0006 05/05/16 00:44:32 CRT BMC  Watchdog_2 #ca Hard Reset action 6f [c1 04 ff]
ipmiutil sel, completed successfully

RESOLUTION

Perform the following to first verify that the watchdog timer service is enabled.
 

[[email protected] ~]# service ipmiutil_wdt status
ipmiutil ver 2.97
iwdt ver 2.97
-- BMC version 1.86, IPMI version 2.0
wdt data: 04 01 00 00 84 03 84 03
Watchdog timer is stopped for use with SMS/OS. Logging
               pretimeout is 0 seconds, pre-action is None
               timeout is 90 seconds, counter is 90 seconds
               action is Hard Reset

ipmiutil wdt, completed successfully
ipmiutil_wdt is running...

Once you've verified that the ipmiutil_wdt service is running as indicated by the output above, stop and disable the service from running at boot by issuing the following commands.
 
[[email protected] ~]# service ipmiutil_wdt stop
Stopping /usr/bin/ipmiutil wdt:

[[email protected] ~]# chkconfig ipmiutil_wdt off

Finally, do verify that the service is stopped considering the init script does not output the results of the stop argument.
 
[[email protected] ~]# service ipmiutil_wdt status
ipmiutil ver 2.97
iwdt ver 2.97
-- BMC version 1.86 IPMI version 2.0
wdt data: 01 00 1e 00 b0 04 b0 04
Watchdog timer is stopped for use with BIOS FRB2. Logging
               pretimeout is 30 seconds, pre-action is None
               timeout is 120 seconds, counter is 120 seconds
               action is No action

ipmiutil wdt, completed successfully
ipmiutil_wdt is stopped

 

CAUSE

In certain cases, the ipmiutil watchdog timer service has been left enabled and can produce unexpected behavior such as a hard reset of the DPU.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Contact us