We've been making back-ups for others for quite some years now. Before, we were using full back-ups, creating a simple copy of the entire source system for every day of the week. This meant the size of the back-up was seven times the size of the source system. To improve on cost and efficiency, we switched over to incremental back-ups.

This article shows how we found a solution to create incremental back-ups over rsync to a remote location. If you're not interested in the recreation, just copy The Final Script, modify it to your needs and off you go.

Basic full back-ups

If you have any experience with Rsync, you might know it's amazingly easy to copy your important files to a back-up location. As the simple example below shows, the logic doesn't really change much if you're using an external disk, a remote server running rsyncd, or running rsync over ssh:

# Rsync to external disk
rsync -av /home/$USER/Doduments /mount/externaldisk/backup

# Rsync to server running rsyncd
rsync -avz /home/$USER/Documents username@backup.perfacilis.com::profilen/Documents

# Rsync to ssh server
rsync -avz /home/$USER/Documents username@ssh.server.com:/backup/Documents

This is as basic as it gets and even though it gets the job done, it's far from a robust solution.

Backing-up multiple folders

Using the magic of a bash script, you can set up an array for the folders you want to back-up and loop trough them:

#!/bin/bash

readonly BACKUP_DIRS=(/etc /home/$USER /root /var/www)

readonly RSYNC_PROFILE="user@backup.perfacilis.com::profile"
readonly RSYNC_DEFAULTS="-avz"

backup_folders() {
local DIR TARGET

for DIR in ${BACKUP_DIRS[@]}; do
  TARGET=${DIR/#\//}
  TARGET=${TARGET//\//_}
  rsync $RSYNC_DEFAULTS $DIR/ $RSYNC_PROFILE/$TARGET
done
}

main() {
backup_folders
}

main

Modify the BACKUP_DIRS array to your needs, simply separate each folder with a space, save the text as backup.sh, and simply run it from your command line: bash backup.sh and your back-up will run. Optionally you can make the file executable running chmod +x backup.sh, this you can run it, without calling bash: ./backup.sh.

A note on Bash code clarity

To keep the script clean, I'll be splitting logic into functions, even though this results in more code. Don't forget the last line where main is called, otherwise nothing will happen. If your script goes weird, use bash -x ./backup.sh for debugging. Also, you can use man bash as a language reference.

Automating your back-up

After modifying the example above to your needs, you'll have a very simple back-up solution. Personally, I don't have the discipline to manually run it every day, so let's automate it.

Schedule your back-up using Crontab (if your computer/server is always on)

First, ensure you've saved your back-up script on a location that works for you, for this example we'll use /home/user/backup.sh. If you've got root access, you can put your back-up script in the system wide crontab using nano /etc/crontab. Otherwise you can modify your personal crontab using crontab -e. Then add the following line:

1 1  * * *     user     bash /home/user/backup.sh

Above example runs every day at 01:01. To change the timing, use man 5 crontab for more details on the cron-table format.

Optionally can put your back-up script in the /etc/cron.hourly/etc/cron.daily, /etc/cron.weekly or /etc/cron.monthly folders instead of using crontab files. Ensure the script file is executable and has no special characters (use man run-parts for filename restrictions), for example:

mv /home/user/backup.sh /etc/cron.daily/backup
chmod +x /etc/cron.daily/backup
chown $USER /etc/cron.daily/backup

Schedule your back-up on an Interval (optionally, if your computer is not always on)

Using the Crontab scheduler as described above has one big downside, because if your computer is powered off at the scheduled time, your back-up doesn't run. Using a systemd-timer instead of Crontabs, you can set it to run when a schedule is missed or even when your computer boots, but some Linux distributions don't have systemd or are strictly against systemd.

By adding a bit of extra logic to our existing backup script, we can check when it last executed. For my laptop for example, I use something like the example below, with an interval of 8 hours (the interval is defined in seconds: 3600 * 8 = 28800) or as soon as it missed a schedule:

#!/bin/bash

readonly BACKUP_LOCAL_DIR="/home/$USER/backup/"
readonly BACKUP_DIRS=(/etc /home/$USER /root /var/www)

readonly RSYNC_PROFILE="user@backup.perfacilis.com::profile"
readonly RSYNC_DEFAULTS="-trlvz --delete --delete-excluded --prune-empty-dirs"

readonly INTERVAL=28800

prepare_local_dir() {
if [ ! -d $BACKUP_LOCAL_DIR ]; then
mkdir -p $BACKUP_LOCAL_DIR
touch -d "January 1 1990" $BACKUP_LOCAL_DIR
fi
}

check_interval() {
local LAST=$(stat -c %Y $BACKUP_LOCAL_DIR)
local NOW=$(date +%s)
local ELAPSED=$(($NOW - $LAST))

if [ "$ELAPSED" -lt "$INTERVAL" ]; then
  echo "Last backup was ${ELAPSED}s ago, which is less than ${INTERVAL}s."
  exit
fi
}

backup_folders() {
local DIR TARGET

for DIR in ${BACKUP_DIRS[@]}; do
TARGET=${DIR/#\//}
TARGET=${TARGET//\//_}
rsync $RSYNC_DEFAULTS $DIR/ $RSYNC_PROFILE/$TARGET
done
}

signoff_interval() {
touch $BACKUP_LOCAL_DIR
}

main() {
prepare_local_dir
check_interval
backup_folders
signoff_interval
}

main

The prepare_local_dir creates a local back-up directory if it doesn't exist. The check_interval checks if the folder's modified time is longer ago than given interval. Finally I close with signoff_interval to update the back-up folder's modified time again, so it's set-up for the next execution.

You do need to change your Crontab to ensure the back-up script runs more often, for example every hour:

1 *  * * *     user     bash /home/user/backup.sh

The check_interval method ensures the time between the actual rsync'ing is never less then given $INTERVAL.

Incremental back-ups with rsync

Looking trough Rsync's examples page, I found the basis of the incremental back-up script below. The idea is to save the latest version of all our files in a "current" directory and move older files into a separate incremental directory name. Again we'll have to add more bash logic:

#!/bin/bash

readonly BACKUP_LOCAL_DIR="/home/$USER/backup"
readonly BACKUP_DIRS=(/etc /home/$USER /root /var/www)

readonly RSYNC_PROFILE="user@backup.perfacilis.com::profile"
readonly RSYNC_DEFAULTS="-trlvz --delete --delete-excluded --prune-empty-dirs"

readonly INCREMENTS=7

prepare_local_dir() {
if [ ! -d $BACKUP_LOCAL_DIR ]; then
mkdir -p $BACKUP_LOCAL_DIR
touch -d "January 1 1990" $BACKUP_LOCAL_DIR
fi
}

prepare_remote_dir() {
local EMPTYDIR=$(mktemp -d)

rsync ${RSYNC_DEFAULTS//--delete* /} $EMPTYDIR/ $RSYNC_PROFILE/current
rm -rf $EMPTYDIR
}

get_next_increment() {
local LAST NEXT

if [ -f $BACKUP_LOCAL_DIR/last ]; then
LAST=$(cat $BACKUP_LOCAL_DIR/last | tr -d "\n")
fi

if [ -z "$LAST" ]; then
echo 0
return
fi

NEXT=$(($LAST+1))
if [ "$NEXT" -gt "$INCREMENTS" ]; then
echo 0
return
fi

echo $NEXT
}

backup_folders() {
local DIR TARGET RSYNC
local INC=$(get_next_increment)

for DIR in ${BACKUP_DIRS[@]}; do
TARGET=${DIR/#\//}
TARGET=${TARGET//\//_}

RSYNC="rsync $RSYNC_DEFAULTS"
if [ "$INC" -gt 0 ]; then
RSYNC="rsync $RSYNC_DEFAULTS --backup --backup-dir=/$INC/$TARGET"
fi

$RSYNC $DIR/ $RSYNC_PROFILE/current/$TARGET
done
}

signoff_increment() {
echo $(get_next_increment) > $BACKUP_LOCAL_DIR/last
}

main() {
prepare_local_dir
prepare_remote_dir

backup_folders

signoff_increment
}

main

First of all, prepare_current_dir ensures the "current" directory exists in the rsync target. Then, in backup_folders the get_next_increment is called, which will return a number between 0 and $INCREMENTS. For 0, it will create a full backup of the given $DIR, a number greater than 0 will save the previous version into the $INC folder and updates the files in the "current" directory to reflect the latest version.

For local back-ups (e.g. an external hdd), we could easily list existing increments in the target to determine the next increment. Too bad we can't list the contents of remote rsync targets, therefore we have to keep track of the last target locally, which is done with the "last" file. The signoff_increment function keeps this file up to date. If the amount of $INCREMENTS is reached, it'll re-create a new full backup into "current", thus rotating/resetting it all. This method is safe to use with both remote and local targets.

You're free to change the amount of increments any point in time. Increasing the amount will create more numeric increment folders. Lowering the number won't delete higher numbers though, it simply skips them.

The missing peaces

Rsync profile password

When using a remote rsync profile, it usually needs authentication. Instead of using the RSYNC_PASSWORD variable, it's safer to set a password file:

readonly RSYNC_SECRET='u53Y0ur0wnPa55w0rdPlz'
readonly RSYNC_DEFAULTS="-trlqz4 --delete --delete-excluded --prune-empty-dirs"

get_rsync_opts() {
local SECRET=`dirname $0`/rsync.secret

if [ ! -f $SECRET ]; then
echo $RSYNC_SECRET > $SECRET
chmod 600 $SECRET
fi

echo "$RSYNC_DEFAULTS --password-file=$SECRET"
}

backup_folders() {
local RSYNC_OPTS=$(get_rsync_opts)

rsync $RSYNC_OPTS /home/$user /mount/externaldisk/backup/
}

cleanup() {
rm -f `dirname $0`/rsync.exclude
rm -f `dirname $0`/rsync.secret
}

main() {
trap "cleanup" EXIT

backup_folders
}

Where we've been using the $RSYNC_DEFAULTS until now, we need to be using the $(get_rsync_opts) instead. See the simplified example above or the final script below. Also, a trap is added to the main function to ensure rsync.exclude and rsync.secret are removed, even if the script is killed.

Mysql backup

If you're using Mysql, you can add this simple backup_mysql function to create gzip files of your database. Don't forget to call it in the main function though.

readonly MYSQL="mysql --defaults-file=/etc/mysql/debian.cnf"
readonly MYSQLDUMP="mysqldump --defaults-file=/etc/mysql/debian.cnf --events --routines --max-allowed-packet=512MB --quick --quote-names --skip-comments"

backup_mysql() {
local DB

for DB in `$MYSQL -e 'show databases' | grep -v 'Database'`; do
if [ $DB = 'information_schema' -o $DB = 'performance_schema' ]; then
continue
fi

$MYSQLDUMP $DB | gzip > $BACKUP_LOCAL_DIR/$DB.sql.gz
done
}

Logging

You usually want cronjobs to remain silent, but you also want to be able to lookup what your back-up script actually did or is doing. Therefore a simple log function can be added to forward output to the log files. It will only output if you manually run the script from an active terminal.

function log() {
local MSG=`echo $1`
logger -p local0.notice -t `basename $0` -- $MSG

if tty -s; then
echo $MSG
fi
}

log "Back-up initiated at $(date)"

Keep in mind not to call log in get_next_increment, get_rsync_opts, because they rely on echo to work properly.

The Final Script

#!/bin/bash
# Title: Perfacilis Incremental Back-up script
# Description: Create back-ups of dirs and dbs by copying them to Perfacilis' back-up servers
# We strongly recommend to put this in /etc/cron.hourly
# Author: Roy Arisse <support@perfacilis.com>
# See: https://admin.perfacilis.com
# Version: 0.5
# Usage: bash /etc/cron.hourly/backup

readonly BACKUP_LOCAL_DIR=/backup
readonly BACKUP_DIRS=(/etc /home /root /var/www $BACKUP_LOCAL_DIR)

readonly RSYNC_PROFILE="perfacilis.example@backup.perfacilis.com::perfacilis.example"
readonly RSYNC_DEFAULTS="-trlqz4 --delete --delete-excluded --prune-empty-dirs"
readonly RSYNC_EXCLUDE=(temp/ tmp/ .cache/ log/ logs/ *.log)
readonly RSYNC_SECRET='u53Y0ur0wnPa55w0rdPlz'

readonly MYSQL="mysql --defaults-file=/etc/mysql/debian.cnf"
readonly MYSQLDUMP="mysqldump --defaults-file=/etc/mysql/debian.cnf --events --routines --max-allowed-packet=512MB --quick --quote-names --skip-comments"

readonly INTERVAL=$((3600*24))
readonly INCREMENTS=28

log() {
MSG=`echo $1`
logger -p local0.notice -t `basename $0` -- $MSG

# Interactive shell
if tty -s; then
echo $MSG
fi
}

prepare_local_dir() {
if [ ! -d $BACKUP_LOCAL_DIR ]; then
mkdir -p $BACKUP_LOCAL_DIR
touch -d "January 1 1990" $BACKUP_LOCAL_DIR
fi
}

prepare_remote_dir() {
local RSYNC_OPTS=$(get_rsync_opts)
local EMPTYDIR=$(mktemp -d)

rsync ${RSYNC_OPTS//--delete* /} $EMPTYDIR/ $RSYNC_PROFILE/current
rm -rf $EMPTYDIR
}

check_interval() {
local LAST=$(stat -c %Y $BACKUP_LOCAL_DIR)
local NOW=$(date +%s)
local ELAPSED=$(($NOW - $LAST))

if [ "$ELAPSED" -lt "$INTERVAL" ]; then
log "Last backup was ${ELAPSED}s ago, which is less than ${INTERVAL}s."
exit
fi
}

get_next_increment() {
local LAST NEXT

if [ -f $BACKUP_LOCAL_DIR/last ]; then
LAST=$(cat $BACKUP_LOCAL_DIR/last | tr -d "\n")
fi

if [ -z "$LAST" ]; then
echo 0
return
fi

NEXT=$(($LAST+1))
if [ "$NEXT" -gt "$INCREMENTS" ]; then
echo 0
return
fi

echo $NEXT
}

get_rsync_opts() {
local EXCLUDE=`dirname $0`/rsync.exclude
local SECRET=`dirname $0`/rsync.secret

if [ ! -f $EXCLUDE ]; then
printf '%s\n' "${RSYNC_EXCLUDE[@]}" > $EXCLUDE
fi

if [ ! -f $SECRET ]; then
echo $RSYNC_SECRET > $SECRET
chmod 600 $SECRET
fi

echo "$RSYNC_DEFAULTS --exclude-from=$EXCLUDE --password-file=$SECRET"
}

backup_packagelist() {
log "Back-up list of installed packages"
dpkg --get-selections > $BACKUP_LOCAL_DIR/packagelist.txt
}

backup_mysql() {
local DB

log "Back-up mysql databases:"
for DB in `$MYSQL -e 'show databases' | grep -v 'Database'`; do
if [ $DB = 'information_schema' -o $DB = 'performance_schema' ]; then
continue
fi

log "- $DB"
$MYSQLDUMP $DB | gzip > $BACKUP_LOCAL_DIR/$DB.sql.gz
done
}

backup_folders() {
local RSYNC_OPTS=$(get_rsync_opts)
local DIR TARGET RSYNC
local INC=$(get_next_increment)
local VANISHED='^(file has vanished: |rsync warning: some files vanished before they could be transferred)'

log "Moving back-up to target: ${INC/#0/current}"
for DIR in ${BACKUP_DIRS[@]}; do
TARGET=${DIR/#\//}
TARGET=${TARGET//\//_}

RSYNC="rsync $RSYNC_OPTS"
if [ "$INC" -gt 0 ]; then
RSYNC="rsync $RSYNC_OPTS --backup --backup-dir=/$INC/$TARGET"
fi

log "- $DIR"
$RSYNC $DIR/ $RSYNC_PROFILE/current/$TARGET 2>&1 | (egrep -v "$VANISHED" || true)
done
}

signoff_interval() {
touch $BACKUP_LOCAL_DIR
}

signoff_increment() {
echo $(get_next_increment) > $BACKUP_LOCAL_DIR/last
}

cleanup() {
rm -f `dirname $0`/rsync.exclude
rm -f `dirname $0`/rsync.secret
}

main() {
log "Backup initiated at `date`"

trap "cleanup" EXIT

prepare_local_dir
prepare_remote_dir

check_interval

backup_mysql
backup_folders

signoff_interval
signoff_increment

log "Backup completed at `date`"
}

main

Conclusion

Even though our resulting script is a bit more intriguing than other existing tools may be, the result is a fool-proof incremental back-up that you can exactly change to your liking. It's capable of pushing your back-ups to both local and remote targets and you can change the increments whenever you want.

Compared to the old situation, where we needed seven times the space of the source system, we can now store one full and between 20 and 28 increments in the same space, that's almost a months worth! In other words, the improvement is substantial. Nonetheless, we would like to improve the script to support setting an amount of daily, weekly and monthly back-ups, to allow for a longer retention period. We're currently researching the possibilities and will post on that as soon as we've found a stable and proven solution.

A note on Duplicity

If you don't like to script so much, Duplicity is probably a good replacement for you. It supports encryption (which is a big pro), sending files over many protocols and can be set up in a single line of code. Don't want to script at all? Try Déjà Dup, this graphical interface for Duplicity makes setting up your back-up a breeze.

I prefer the manual script though. It's easy to change and I know exactly where to find what file, since the result is a simple flat file format. Duplicity creates multiple archives for each snapshot, which need to be searched trough using duplicity's tools. For me, that's too much of a hassle.

Sources

Changelog

2020-06-22
Added --skip-comments option to $MYSQLDUMP to get rid of Dump completed on … comment. This avoids a dump to be saved remotely, if the data itself hasn't changed.
2020-07-03
Replaced ${INC/0/current} with ${INC/#0/current}, to avoid numbers ending in "0" to be replaced.
Silenced rsync's file has vanished erorrs, using example from Benoit Jacquemont