Another Incremental Back-up script?

Recently, we published a similar guide on creating an incremental back-up script, which was nice, but we could improve. The retention period of the previous script was determined by changing the interval and the amount of increments. E.g. a daily interval and 14 increments, translated to a retention period 14 days. But what if we want a retention period of a month? or ten years? This would mean 30 or 3650 increments, resp.

Even with incremental back-ups, many increments need lots of disk space. Moreover, it's simply not dynamic, thus not suitable as a proper back-up solution. We've created a new iteration, allowing you to set multiple intervals (.e.g hourly, daily, weekly, monthly, yearly) and set the amount of increments per interval.

This will hopefully be the last back-up script you'll need in a long time. It's the only one I'm using anyways.

TL;DR: The entire script

#!/bin/bash
# Title: Perfacilis Incremental Back-up script
# Description: Create back-ups of dirs and dbs by copying them to Perfacilis' back-up servers
# We strongly recommend to put this in /etc/cron.hourly
# Author: Roy Arisse <support@perfacilis.com>
# See: https://admin.perfacilis.com
# Version: 0.7
# Usage: bash /etc/cron.hourly/backup

readonly BACKUP_LOCAL_DIR="/backup"
readonly BACKUP_DIRS=($BACKUP_LOCAL_DIR /home /root /etc /var/www)

readonly RSYNC_TARGET="profile@backup.perfacilis.com"
readonly RSYNC_DEFAULTS="-trlqz4 --delete --delete-excluded --prune-empty-dirs"
readonly RSYNC_EXCLUDE=(tmp/ temp/ log/ logs/ *.log ImapMail/ .Gulden/)
readonly RSYNC_SECRET='RSYNCSECRETHERE'

readonly MYSQL="mysql -uUSER -pPASSWORD"
readonly MYSQLDUMP="mysqldump -uUSER -pPASSWORD --events --routines --max-allowed-packet=512MB --quick --quote-names --skip-comments"

# Amount of increments per interval and duration per interval resp.
readonly -A INCREMENTS=([hourly]=24 [daily]=7 [weekly]=4 [monthly]=12 [yearly]=5)
readonly -A DURATIONS=([hourly]=3600 [daily]=86400 [weekly]=604800 [monthly]=2419200 [yearly]=31536000)

# ++++++++++ NO CHANGES REQUIRED BELOW THIS LINE ++++++++++

export LC_ALL=C

log() {
MSG=`echo $1`
logger -p local0.notice -t `basename $0` -- $MSG

# Interactive shell
if tty -s; then
echo $MSG
fi
}

prepare_local_dir() {
[ -d $BACKUP_LOCAL_DIR ] || mkdir -p $BACKUP_LOCAL_DIR
}

prepare_remote_dir() {
local TARGET="$1"
local RSYNC_OPTS=$(get_rsync_opts)
local EMPTYDIR=$(mktemp -d)
local DIR TREE

if [ -z "$TARGET" ]; then
echo "Usage: prepare_remote_dir remote/dir/structure"
exit 1
fi

# Remove opions that delete empty dir
RSYNC_OPTS=$(echo "$RSYNC_OPTS" | sed -E 's/--(delete|delete-excluded|prune-empty-dirs)//g')

for DIR in ${TARGET//\// }; do
TREE="$TREE/$DIR"
rsync $RSYNC_OPTS $EMPTYDIR/ $RSYNC_TARGET/${TREE/#\//}
done

rm -rf $EMPTYDIR
}

get_last_inc_file() {
local PERIOD="$1"

if [ -z "$PERIOD" ]; then
echo "Usage: ${FUNCTION[0]} daily"
exit 1
fi

echo "$BACKUP_LOCAL_DIR/last_inc_$PERIOD"
}

get_next_increment() {
local PERIOD="$1"
local LIMIT="${INCREMENTS[$PERIOD]}"
local LAST NEXT INCFILE

if [ -z "$PERIOD" -o -z "$LIMIT" ]; then
echo "Usage: get_next_increment period"
echo "- period = 'hourly', 'daily', 'weekly', 'monthly'"
exit 1
fi

INCFILE=$(get_last_inc_file $PERIOD)
if [ -f "$INCFILE" ]; then
LAST=$(cat "$INCFILE" | tr -d "\n")
fi

if [ -z "$LAST" ]; then
echo 0
return
fi

NEXT=$(($LAST+1))
if [ "$NEXT" -gt "$LIMIT" ]; then
echo 0
return
fi

echo $NEXT
}

get_intervals_to_backup() {
local NOW=$(date +%s)
local LAST PERIOD INCFILE DURATION DIFF
local PERIODS=()

for PERIOD in "${!DURATIONS[@]}"; do
# Skip disabled intervals
if [[ ${INCREMENTS[$PERIOD]} -eq 0 ]]; then
continue;
fi

LAST=0
INCFILE=$(get_last_inc_file $PERIOD)
if [ -f "$INCFILE" ]; then
LAST=$(date +%s -r "$INCFILE")
fi

DURATION=${DURATIONS[$PERIOD]}
DIFF=$(($NOW - $LAST))
if [ $DIFF -gt $DURATION ]; then
PERIODS+=("$PERIOD")
fi
done

echo "${PERIODS[*]}"
}

get_rsync_opts() {
local EXCLUDE=`dirname $0`/rsync.exclude
local SECRET=`dirname $0`/rsync.secret
local OPTS="$RSYNC_DEFAULTS"

if [ ! -z '$RSYNC_EXCLUDE' ]; then
if [ ! -f $EXCLUDE ]; then
printf '%s\n' "${RSYNC_EXCLUDE[@]}" > $EXCLUDE
chmod 600 $EXCLUDE
fi

OPTS="$OPTS --exclude-from=$EXCLUDE"
fi

if [ ! -z '$RSYNC_SECRET' ]; then
if [ ! -f $SECRET ]; then
echo $RSYNC_SECRET > $SECRET
chmod 600 $SECRET
fi

OPTS="$OPTS --password-file=$SECRET"
fi

echo "$OPTS"
}

backup_packagelist() {
local TODO=$(get_intervals_to_backup)

if [ -z "$TODO" ]; then
return
fi

log "Back-up list of installed packages"
dpkg --get-selections > $BACKUP_LOCAL_DIR/packagelist.txt
}

backup_mysql() {
local TODO=$(get_intervals_to_backup)
local DB

if [ -z "$TODO" ]; then
return
fi

log "Back-up mysql databases:"
for DB in `$MYSQL -e 'show databases' | grep -v 'Database'`; do
if [ $DB = 'information_schema' -o $DB = 'performance_schema' ]; then
continue
fi

log "- $DB"
$MYSQLDUMP $DB | gzip > $BACKUP_LOCAL_DIR/$DB.sql.gz
done
}

backup_folders() {
local RSYNC_OPTS=$(get_rsync_opts)
local DIR TARGET RSYNC INC
local VANISHED='^(file has vanished: |rsync warning: some files vanished before they could be transferred)'
local TODO=$(get_intervals_to_backup)

if [ -z "$TODO" ]; then
log "No intervals to back-up yet."
exit
fi

for PERIOD in $TODO; do
INC=$(get_next_increment $PERIOD)
log "Moving $PERIOD back-up to target: ${INC/#0/current}"

for DIR in ${BACKUP_DIRS[@]}; do
TARGET=${DIR/#\//}
TARGET=${TARGET//\//_}

RSYNC="rsync $RSYNC_OPTS"
if [ "$INC" -gt 0 ]; then
RSYNC="rsync $RSYNC_OPTS --backup --backup-dir=$RSYNC_TARGET/$PERIOD/$INC/$TARGET"
fi

log "- $DIR"
prepare_remote_dir "$PERIOD/current"
$RSYNC $DIR/ $RSYNC_TARGET/$PERIOD/current/$TARGET 2>&1 | (egrep -v "$VANISHED" || true)
done

done
}

signoff_increments() {
local TODO=$(get_intervals_to_backup)
local PERIOD INC INCFILE

for PERIOD in $TODO; do
INC=$(get_next_increment $PERIOD)
INCFILE=$(get_last_inc_file $PERIOD)
echo $INC > "$INCFILE"
done
}

cleanup() {
rm -f `dirname $0`/rsync.exclude
rm -f `dirname $0`/rsync.secret
}

main() {
log "Back-up initiated at `date`"

trap "cleanup" EXIT

prepare_local_dir

backup_packagelist
backup_mysql
backup_folders

signoff_increments

log "Back-up completed at `date`"
}

main

How it works

Setting increments

The INCREMENTS variable stores the amount of increments to save per period. The DURATIONS variable stores how long — in seconds — a period is, this variable only needs changing if you want to alter the duration or add new periods.

In INCREMENTS, you can set the amount to "0" to exclude the increment. For every increment you include, a folder on the back-up target location is created automatically.

Keep in mind both vars are associative arrays, make sure the formatting is right. If you're interested, Andy Balaam's blogpost is a great explanation. If you're not interested, just look at the current formatting and change as you wish.

Installation

Copy the contents of the entire script in a file you name "backup", store it in "/etc/cron.hourly":

sudo nano /etc/cron.hourly/backup
sudo chmod +x /etc/cron.hourly/backup

Don't forget, if you've created an hourly or even shorter period, the script needs to be called more often. Save the file somewhere else and call it accordingly from /etc/crontab (or any other method you like).

The following variables probably need changing:

  • BACKUP_LOCAL_DIR: Folder to keep required tracking files;
  • BACKUP_DIRS: The folders you want to have back when you computer or server dies, don't remove $BACKUP_LOCAL_DIR;
  • RSYNC_TARGET: Where the actual back-up should be stored — the remote, possibly off-site, location;
  • RSYNC_SECRET: Optional, if Rsync profile on the remote server requires a secret;
  • MYSQL: Either change username/password parameters, or use --defaults-file=/etc/mysql/debian.cnf;
  • MYSQLDUMP: Same as the MSQL variable.

The following variables only need checked:

  • INCREMENTS: Change the amount of increments you want to save in addition to the full back-up per period. 

Full and Incremental Back-ups

The first time the script runs, it creates one full back-up per period and stores it in a current folder e.g daily/current, hourly/current, etc.

The next run — the first increment — it stores that increment in 1, e.g. daily/1. Files modified since the last run are moved to this folder and the latest copy is moved to the "current" folder. For the amount of given increments for a period, new increments are created every run, e.g daily/2, daily/3, etc.

Finally, when the amount of increments has reached, a new full back-up is stored in "current" folder. After that, the increment folders are updated one by one.

To make it easier to find files modified before certain date or time, each folder's timestamp is updated to match the time it ran.

Back-up to a local folder or USB disk instead of a remote Rsync server

The RSYNC_TARGET variable dictates the remote — preferably off-site location — for the back-up. For example, if your back-up USB disk is mounted at /dev/sdc1 (use lsblk to find out where it's mounted) change it as follows:

readonly RSYNC_TARGET="/dev/sdc1"
readonly RSYNC_DEFAULTS="-trlqz4 --delete --delete-excluded --prune-empty-dirs"
readonly RSYNC_EXCLUDE=(/tmp /temp)
readonly RSYNC_SECRET=""

Don't forget to empty the RSYNC_SECRET variable, to ensure it all works as it should.

What's BACKUP_LOCAL_DIR for?

The back-up script keeps track of which increment it last completed, by storing a file per period in the BACKUP_LOCAL_DIR folder, e.g. "last_inc_hourly", "last_inc_daily", etc. The timestamp of these files is used to determine when that increment was created, to see if the period is elapsed. If you remove these files, or the dir entirely, the script will start with "current" as explained above.

This ensures that if a run was missed — because your laptop was powered off, or because your server was rebooting — the next increment is created as soon as it powers on again, though not sooner than given period duration.

Finally, this folder contains a local copy of all created database dumps.

Conclusion

Is this the final back-up scrip we'll ever create? Probably not, there's always room for improvement. This script allows to create proper back-ups you can rely on, that span big retention periods, without requiring an unhealthy amount of disk space. In our opinion, it's a healthy mix between incremental an full back-ups, allowing for proper disaster recovery.

The current script will only function on Linux systems, or at least systems running Bash. It's unknown if it will run using the Linux Subsystem for Windows 10. Therefore a proper Windows Shell replacement for this script would be a very welcome addition.

Finally, the script is probably not suitable for the less tech-savvy among us, but that — in my humble opinion — might be a pretty good user filter on itself: If you can't get it running, don't use it.

Perfacilis Back-up Service

The Perfacilis Back-up service alerts you when a back-up didn't finish in the scheduled time-frame, or if a lot of data changes at once (which might indicate an encrypter virus). You only pay for the amount of space your back-ups use.

Want to learn more?

Contact us