Out of Space (Special database mode to allow deletions, and not consume space freed space)

SUMMARY

Details on how to start troubleshooting out of space issues.

ISSUE

An appliance can run out of space for various reasons, including log levels being too high or too much data being ingested at the same time.  This article will help you narrow down what is the cause of the problem and how to fix it. 

For more information on "Out of Space" conditions see Error: No more space on device

RESOLUTION

  1. SSH to the appropriate appliance
  2. Use the command below to display storage being used, appliance asset and the Unitrends version.   
    With Support macro:  [isi
    
    Without:  
    clear;echo;dpu asset;echo "Version: `rpm -qa|grep unitrends-rr|grep -v windowmgr|awk -F'-' '{print $3 "-" $4}'|awk -F'.' '{print $1"."$2"."$3}'`";echo "Running: `cat /etc/redhat-release|sed -e 's/Recovery/Cent/g'|awk  '{print $1, $3}' | grep -iv uni`";echo "Up for: `uptime|awk '{print $3, $4}'|sed 's/,$//'`";echo "Load average: `cat /proc/loadavg|awk '{print $1, $2, $3}'`";if [[ $(ps aux|grep connector_rc|grep -v grep >/dev/null 2>&1;echo $?) -eq 0 ]];then echo "iTivity Tunnel: Asset = `dpu asset|awk '{print $NF}'`";elif [[ $(ps aux|grep 222|grep support  >/dev/null 2>&1;echo $?) -eq 0 ]];then echo "Legacy Tunnel Number: `ps -leaf|grep 222|grep support|awk '{print $24}'|awk -F':' '{print $1}'`";else echo "No Tunnel";fi;echo;if [[ -e `ps -leaf|grep tasker|grep -v grep|awk '{print $NF}'` ]];then echo "Tasker is running";else echo -e "\e[1;31mTasker is not running.\e[0m";fi;if [[ -e `ps -leaf|grep devmon|grep -v grep|awk '{print $NF}'` ]];then echo "Devmonitor is running";else echo -e "\e[1;31mDevmonitor is not running.\e[0m";fi;echo;echo "Processor count: $(cat /proc/cpuinfo|grep processor|wc -l)";echo;echo "Memory Useage: ";free -m|grep Mem|awk '{print "Total: "$2 "MB"}';free -m|grep Mem|awk '{print "Used: "$3 "MB"}';free -m|grep Mem|awk '{print "Free: "$4 "MB"}';echo;echo;df -h;echo;echo;if [[ $(ifconfig|grep -iq tun;echo $?) -eq 0 ]];then if [[ $(/usr/bp/bin/bputil -g -c `hostname` "Replication" "Enabled" -1 master_ini) == [Yy]es ]];then echo "This Unit has tun0 and Replication enabled";echo;elif [[ $(/usr/bp/bin/bputil -g -c `hostname` "Securesync" "AutoSyncEnabled" -1 master_ini) == [Yy]es ]];then echo "This Unit has tun0 and Vaulting enabled.";echo;elif [[ $(psql -U postgres bpdb -c "select name, role from bp.systems"|grep `hostname`|grep -v .dpu|awk '{print $NF}') == DPU ]];then echo "We have tun0, but no Vaulting or Replication.";echo "This may be a Managed System.";echo;elif [[ $(psql -U postgres bpdb -c "select name, role from bp.systems"|grep `hostname`|grep -v .dpu|awk '{print $NF}') == Vault ]];then echo "We have tun0 and this is a Target.";echo;elif [[ $(psql -U postgres bpdb -c "select name, role from bp.systems"|grep `hostname`|grep -v .dpu|awk '{print $NF}') == Manager ]];then echo "This has tun0 and is a Manager.";else echo "Something is broken.";fi;psql -U postgres bpdb -c "select * from bp.managers";echo;else if [[ $(/usr/bp/bin/bputil -g -c `hostname` "Replication" "Enabled" -1 master_ini) == [Yy]es ]];then echo "This unit has Replication enabled";echo "but does not have tun0";echo;psql -U postgres bpdb -c "select * from bp.managers";echo;elif [[ $(/usr/bp/bin/bputil -g -c `hostname` "Securesync" "AutoSyncEnabled" -1 master_ini) == [Yy]es ]];then echo "This unit has Vaulting Enabled";echo "but does not have tun0";echo;psql -U postgres bpdb -c "select * from bp.managers";echo;elif [[ $(psql -U postgres bpdb -c "select name, role from bp.systems"|grep `hostname`|grep -v .dpu|awk '{print $NF}') == DPU ]];then echo "No tun0 but no vaulting/replication either.";echo;echo;elif [[ $(psql -U postgres bpdb -c "select name, role from bp.systems"|grep `hostname`|grep -v .dpu|awk '{print $NF}') == Vault ]];then echo "We don't have tun0, but this is a target.";echo;elif [[ $(psql -U postgres bpdb -c "select name, role from bp.systems"|grep `hostname`|grep -v .dpu|awk '{print $NF}') == Manager ]];then echo "This unit does not have tun0, but is a Manager.";echo;else echo "Something is broken.";echo;fi;fi
  3. From the output, verify where the high data usage is happening. 
    1. If high usage (100%) is shown in /, contact support
    2. If high usage is shown in the database, and system is running CentOS 5 then contact support to perform a CentOS 5 to CentOS 6 
    3. If high usage is shown in the database and the system is running CentOS 6, a database dump and reload may be required. Contact support for assistance
    4. If high usage is being shown in /backups, please see below.  
  4. Check the Capacity Report in the Legacy Interface to verify that the data footprint is not more than the capacity allowed on the appliance.  
  5.  Also, verify if any Instant Recovery space is being used by Settings > Storage and Retention > Storage Allocation.  This storage may need to be recovered to be used as backup space if enough is not available on the appliance.  
  6. If problems are still encountered, check the free space on the appliance.  You will probably see no Free Space available on the output of:
    /usr/bp/bin/smgr_display
  7. If no Free Space is available stop all services:
    /etc/init.d/bp_rcscript stop
  8. Restart the database:
    /usr/bp/bin/start_db.sh
  9. Start devmonitor:
    /usr/bp/bin/devmonitor
  10. Start cryptodaemon.  This impacts the landing zone in smgr_display.
    /usr/bp/bin/cryptoDaemon
  11. Start filededup: 
    /usr/bp/bin/fileDedup
  12. Start retentionMgr:
  13. /usr/bp/bin/retentionMgr
  14. Go to the UI and clear out all failed backups.  
  15. Check and see if there are any synthetic jobs running by checking in the backup browser.  If synthetic jobs are running, kill the jobs with this the following command: 
    psql bpdb -U postgres -c "update bp.backups set status='25168' where backup_no='X'";
    Note: In this command,  the status is always = '25168' and X= the backup id number 
  16. With these processes started, the UI will be active but no backups nor auto synthesis will be running. Since the appliance was brought offline, any jobs that were running releases the reservation it had in smgr_display. This is the best scenario for the appliance - where filededup is running, making backups smaller, and no other services are running, adding to the storage. In that 'mode' you can select other backups to remove (preferably failed ones) from the Backup Browser (Settings > Storage and Retention > Backup Browser). Selecting those backups to be removed will remove any backups in that chain as well. After a few moments you should be able to see df -h and /usr/bp/bin/smgr_display display growing free space.
  17. If you are still seeing problems, check to see if /usr/bp/bin/retentionMgr is running (or spacereclaim in versions prior to 9.0) by using the command below.  If its been running longer than the time it has taken to clear backups, its probably a stale session. If it's running and matches the landing zone, it hasn't picked up the latest changes. Kill all retentionMgr or spacereclaimer jobs and it will auto kick off a new one.
    ps aux | grep spacereclaim
    ps aux | grep retentionMgr
    
  18. At the conclusion of reclaiming space, stop all processes prior to bringing everything back online.
    /etc/init.d/bp_rcscript stop
    
    /etc/init.d/bp_rcscript start
  19. If a space condition was caused by a synthetic backup claiming all / any available space, start a new 1 time full backup of the client who's backup you failed in step 15 .
  20. If you do all of this and are still seeing problems, you may have database bloat on the appliance.  Please contact support for assistance if this is the case.  
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Contact us