Backups fail when using EMC Isilon Storage running OneFS

SUMMARY

Backups stop or fail when using EMC Isilon Storage running OneFS due to loss of inodes

ISSUE

Backups may pause or fail, or severe issues with the unitrends backup appliance may be encountered when using EMC Isilon Storage configured as an NFS or CIFS passthru device where the Isilon is using EMC's OneFS file system.  

RESOLUTION

Though Unitrends documents that EMC storage systems are supported for NFS pasthru use in our KB Supported external storage vendors for use with Unitrends Backup appliances, not all storage systems are equal.  In the case of the Isilon system, using OneFS formatted file systems will result in failure of inodes in the file system to be properly addressed.  

As a best practice, vendor storage should be connected directly to your hypervisor and the UB should be deployed with Virtual Disks. However, EMC Isilon is not supported by VMWare or Citrix for such connection (as of July 2018, status may change in future).  Unitrends only supports storage systems that the hypervisor vendor you are using also supports, as well as specific vendors (EMC included) using CIFS or NFS mounts as documented here:  Supported external storage vendors for use with Unitrends Backup appliances.  Though EMC is generally supported, because hypervisor support is not present here, Unitrends official stance would be this storage systems cannot be used for backup storage.  However, it may be made to work in some cases.  The primary issue is the use of OneFS as a filesystem.  

Passthru storage is supported using XFS, EXT3/4, or NTFS file system formats, but, OneFS used by EMC by default in Isilon devices is not supported.   After contacting EMC support directly, EMC may provide alternative solutions for your Isilon configuration options including reconfiguring it using XFS formatted partitions, and may make other adjustments or recommendations to ensure that the storage can support the IO necessary to run our database (typically between 500 and 2000 sustained write IO).  See Move the Unitrends database off of stateless storage and onto a new partition for information about this process if your Isilon cannot meet these requirements even after being converted to XFS partition formats. 

If you are encountering the issues described in this article, your Unitrends Appliance must be redeployed after your Isilon is properly configured, or, will need to be deployed to alternate storage if it cannot be.  .  Backups that cannot be archived due to storage issues will be lost during redeployment. 

CAUSE

An Inode is a data structure on a filesystem on Linux that stores all the information about a file. A data structure is a way of storing data so that it can be used efficiently. If the Inode is dropped we do not have a way of using that information. EMC Isilon systems using NFS shares hosted on OneFS file systems will have performance or timing issues when high-load operations like live databases are run from within them.  This storage is not intended for use with server loads for live applications and is marketed for use as user filesystems, which do not have the same requirements.  Log output shown below may be seen on some storage systems by running "dmesg" at a shell promp on your unitrends Appliance.  

1. The NFS storage is being overwhelmed and cannot keep up with the number of transactions that are being sent to it.
2. The NFS storage server file system is 100% full of inodes available on the partition. 

Below is an example of the type of data found in dmesg that signifies the loss of data

<ffffffff8153c797>] ? _spin_unlock_irqrestore+0x17/0x20
[<ffffffff81127920>] ? sync_page+0x0/0x50
[<ffffffff81539a73>] io_schedule+0x73/0xc0
[<ffffffff8112795d>] sync_page+0x3d/0x50
[<ffffffff8153a55f>] __wait_on_bit+0x5f/0x90
[<ffffffff81127b93>] wait_on_page_bit+0x73/0x80
[<ffffffff810a18a0>] ? wake_bit_function+0x0/0x50
[<ffffffff8113dcf2>] ? pagevec_lookup+0x22/0x30
[<ffffffff81140045>] invalidate_inode_pages2_range+0x2b5/0x3b0
[<ffffffff81140157>] invalidate_inode_pages2+0x17/0x20
[<ffffffffa03634a3>] nfs_revalidate_mapping+0x223/0x2a0 [nfs]
[<ffffffffa03604a7>] nfs_file_read+0x77/0x130 [nfs]
[<ffffffff8119204a>] do_sync_read+0xfa/0x140
[<ffffffff81090ece>] ? send_signal+0x3e/0x90
[<ffffffff810a1820>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81091306>] ? group_send_sig_info+0x56/0x70
[<ffffffff8109135f>] ? kill_pid_info+0x3f/0x60
[<ffffffff81232026>] ? security_file_permission+0x16/0x20
[<ffffffff81192945>] vfs_read+0xb5/0x1a0
[<ffffffff811936f6>] ? fget_light_pos+0x16/0x50
[<ffffffff81192c91>] sys_read+0x51/0xb0
[<ffffffff810e8c2e>] ? __audit_syscall_exit+0x25e/0x290
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task postgres:5503 blocked for more than 120 seconds.
Not tainted 2.6.32-573.26.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
postgres D 0000000000000003 0 5503 2849 0x00000080
ffff88000e943b38 0000000000000086 0000000000000000 ffff8800282d0a40
ffffffff81ab8740 0000000000000082 00036d7dc242a351 0000000000000082
ffff88000e943b28 00000001397799b1 ffff8801a9215068 ffff88000e943fd8
Call Trace:
[<ffffffff81127920>] ? sync_page+0x0/0x50
[<ffffffff81539a73>] io_schedule+0x73/0xc0
[<ffffffff8112795d>] sync_page+0x3d/0x50
[<ffffffff8153a55f>] __wait_on_bit+0x5f/0x90
[<ffffffff81127b93>] wait_on_page_bit+0x73/0x80
[<ffffffff810a18a0>] ? wake_bit_function+0x0/0x50
[<ffffffff81127a37>] ? unlock_page+0x27/0x30
[<ffffffff81140045>] invalidate_inode_pages2_range+0x2b5/0x3b0
[<ffffffff81140157>] invalidate_inode_pages2+0x17/0x20
[<ffffffffa03634a3>] nfs_revalidate_mapping+0x223/0x2a0 [nfs]
[<ffffffffa03604a7>] nfs_file_read+0x77/0x130 [nfs]
[<ffffffff8119204a>] do_sync_read+0xfa/0x140
[<ffffffff81090ece>] ? send_signal+0x3e/0x90
[<ffffffff810a1820>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81091306>] ? group_send_sig_info+0x56/0x70
[<ffffffff8109135f>] ? kill_pid_info+0x3f/0x60
[<ffffffff81232026>] ? security_file_permission+0x16/0x20
[<ffffffff81192945>] vfs_read+0xb5/0x1a0
[<ffffffff811936f6>] ? fget_light_pos+0x16/0x50
[<ffffffff81192c91>] sys_read+0x51/0xb0
[<ffffffff810e8c2e>] ? __audit_syscall_exit+0x25e/0x290
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Contact us