VMware Change Block Tracking (CBT) bug 2090639 documentation
Serious VMware Change Block Tracking (CBT) bug (2090639) - ReliableDR Replication impact
Potential impacts of VMWare CBT bug in ReliableDR Replication jobs.
Description of VMware bug 2090639
Recently VMware released a knowledge base article (link here), that potentially has significant impact on any product relying on the VMware API for Data Protection. VMware has confirmed in the KB titled QueryChangedDiskAreas command returns incorrect sectors after extending virtual machine vmdk file with Changed Block Tracking (CBT) enabled (2090639)*. RDR versions 3.x do utilize this API when obtaining the blocks in use for full replications in ReliabeDR Replication jobs.
“This issue occurs on a virtual machine with Changed Block Tracking (CBT) enabled, when extending a virtual disk (vmdk) file to a size strictly above 128 GB, due to the block tracking information being recalculated incorrectly.”
ReliableDR Replication potential impacts
Initial ReliableDR replication:
The initial full replication (or seeding) of a VM on a VMFS volume may be impacted by this bug if CBT had been enabled sometime in the past AND the VMDK was expanded as explained in 2090639 prior to the initial full replication (or seeding) by ReliableDR.
Subsequent Full replications:
You may be impacted by this bug after the initial full ReliableDR replication only if the VM’s VMDK is as explained in 2090639, AND you run a Full replication of the VM after the VMDK expansion, by manually changing the VM Full setting to On.
Identifying Potential impact
The good news is that ReliableDR will automatically detect many types of VM corruption during its automated RPO/RTO testing of the replicated VM. However, it is possible that some types of corruption may not be detected by these automated tests. In this case, you will need to manually reset CBT on the VM and perform a new full ReliableDR replication.
ReliableDR Replication workaround
If you know that your VM’s VMDK was expanded as described prior to the first full replication of the VM by ReliableDR, or if you know that you have expanded the VMDK as described and have subsequently initiated another manual full replication, then as a precaution, you should manually reset change block tracking (CBT) of the VM and run a new full replication of the data. For details on manually resetting CBT, see this VMware article: KB2139574.
Q: Will Unitrends provide a fix?
A: ReliableDR 4.0, available in Q1 2015, has a fix for this issue.
Q: Are the other ReliableDR jobs impacted?
A: No. ReliableDR Storage Replication, Unitrends Backup, and CertifiedReplica jobs are not impacted by this bug.
Q: If I reset CBT and run a Full replication, could I encounter this issue again on the VM?
A: Yes, this issue occurs any time the VMDK is expanded past the 128GB boundary or any power of 2 above 128GB (e.g. 256GB, 512GB, 1024GB, and so on).
Q: What if an incremental fails and I need to resync the disk size, will I risk data corruption?
A: No. If you resync the replica’s disk size you do not run into the VMware CBT issue. Whenever the VMDK is resized, the next incremental ReliableDR Replicaion fails and you are prompted to resync the disk size. This standard resync procedure does not pose any risk.
Q: Is there any problem with ReliableDR Replication jobs where the VMs are running on NFS or RDM disks?
A: No. Only VMs that are in VMFS volumes encounter this VMware CBT issue.