Debugging Strategy for Windows Instant Recovery Verify failures

SUMMARY

Debugging Strategy for Windows Instant Recovery Verify failures

ISSUE

Purpose

Explain debugging procedure for Windows Instant Recovery (WIR) Verify failures.  Note this KB article doesn’t give detailed instructions on how to execute the debugging steps, but gives an overall debugging strategy.  Specific debugging steps for specific types of problems are explained in other KB articles.

Description

When a WIR instance is set to email a verification report after restores, the system will periodically boot the instance in “Verify Mode” to attempt to determine if the WIR VM is being restored properly.  When an instance is routinely sending verify failure emails, it can be difficult to determine the source of the problem.

Cause

The WIR email verification process includes booting the instance with a virtual network adapter, waiting for the adapter to get an IP address from a virtual DHCP server, connecting to the Windows Agent, then taking a screenshot of the login screen, which is saved and emailed to everyone on the appliance’s email address list.  A verify failure means for some reason the restore process could not connect to the Windows agent.  Debugging the problem means finding the reason why the connection could not be established between the appliance and the agent running in the WIR instance.

Some of the more common connection failure causes are:

1.   There is a problem with the restores that causes the VM to be unable to boot.

2.   The WIR VM boots too slowly and the verify procedure times out before a connection can be established with the agent.

3.   The drivers are not available for the virtual network adapter, and it is therefore not functional after the VM boots.

4.   The virtual network adapter gets installed properly, but cannot get a DHCP address from the virtual DHCP server.

5.   The client has a firewall setting that prevents a connection to the virtual client through the virtual network adapter.

Resolution

The default timeout for a verify attempt is 3 minutes. 

The debug procedure will differ with each issue, but the following may provide a meaningful general procedure:

1.   Verify the virtual machine is booting up in Verify mode.  You will either need to increase the timeout value by modifying the master.ini file (by adding the MaxQemuBoot key – value is in seconds), or start up the VM manually from the command line (the procedure for doing this is the topic for another KB article).  If after a reasonable time, the VM is not booting to a login prompt, then there is likely something wrong with the restores.

2.   If the VM is booting to the login prompt in Verify mode, login and after a couple of minutes, verify that the network adapter was set up and has received an IP address from the virtual DHCP server (using ipconfig from the command line).  If the network adapter never gets injected into the WIR instance, then it is likely the driver is missing on the virtual client. 

If the network adapter gets set up correctly, but never gets an IP address from the DHCP server, then that is the problem.  You should begin debugging the problem on the appliance by checking the following:

·         Is the port security level set higher than “No Security”?  If so, change it to “No Security”.

·         Is there an IP address assigned to the WIR instance in the /var/run/qemu-dnsmasq-qemubr.leases file?  If so, then why isn’t the WIR instance getting it?

·         Is there a firewall issue on the WIR instance preventing it from getting a DHCP IP address?

3.   If the network adapter is set up and has an IP address within the default 3 minute timeout period, verify you can connect to the agent from the command line on the appliance (telnet :1743).  You should receive output that looks something like this:

[[email protected] logs.dir]# telnet 192.168.55.23 1743

Trying 192.168.55.23...

Connected to 192.168.55.23 (192.168.55.23).

Escape character is '^]'.

¥A,Connect1745quit

If the network adapter does not obtain an IP address within the 3 minute default timeout period, but eventually does obtain one, then you have several options to resolve the issue:

·         Increase the timeout value

·         Give the WIR instance additional resources, i.e. increase the number of processors allocated to it, or give it additional virtual RAM.  After making the modification, reboot in Verify mode and observe whether the IP address is assigned more quickly during boot.

4.   If the WIR instance is getting an IP address within the default timeout period, and you can connect to port 1743 using telnet from the command line, then you should look in the logs on the WIR instance and the appliance for the source of the problem.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Contact us