Incremental Back-up Script using Rsync (v1)

  •  
  •  

Full backups versus Incremental backups

A Full Backup is simply one-to-one copy of all files of the source to the target, also a mirror. For example, if you want to backup your home folder of 100G for 7 days of the week, your backup medium would need 700G of free space. An Incremental Backup or Differential Backup starts with one full backup and then creates a version history where only changes in your files are stored. This results in a significant decrease in the amount of disk space needed.

For reference only, superseded by our latest incremental rsync backup script

This page explains in detail how an incremental backup can be made using rsync. It relies on an idea that has been changed and improved a lot.
If you don't want to learn how it works and just want a working script:

Rsync Incremental Backup Script (v2)  

Rsync full back-ups

If you have any experience with Rsync, you might know it's amazingly easy to copy your important files to a back-up location. As the simple example below shows, the logic doesn't really change much if you're using an external disk, a remote server running rsyncd, or running rsync over ssh:

# Rsync to external disk
rsync -av /home/$USER/Doduments /mount/externaldisk/backup

# Rsync to server running rsyncd
rsync -avz /home/$USER/Documents username@backup.perfacilis.com::profilen/Documents

# Rsync to ssh server
rsync -avz /home/$USER/Documents username@ssh.server.com:/backup/Documents

This is as basic as it gets and even though it gets the job done, it's far from a robust solution.

Backing-up multiple folders

Using the magic of a bash script, you can set up an array for the folders you want to back-up and loop trough them:

#!/bin/bash

readonly BACKUP_DIRS=(/etc /home/$USER /root /var/www)

readonly RSYNC_PROFILE="user@backup.perfacilis.com::profile"
readonly RSYNC_DEFAULTS="-avz"

backup_folders() {
local DIR TARGET

for DIR in ${BACKUP_DIRS[@]}; do
  TARGET=${DIR/#\//}
  TARGET=${TARGET//\//_}
  rsync $RSYNC_DEFAULTS $DIR/ $RSYNC_PROFILE/$TARGET
done
}

main() {
backup_folders
}

main

Modify the BACKUP_DIRS array to your needs, simply separate each folder with a space, save the text as backup.sh , and simply run it from your command line: bash backup.sh  and your back-up will run. Optionally you can make the file executable running chmod +x backup.sh , this you can run it, without calling bash:  ./backup.sh .

A note on Bash code clarity

To keep the script clean, I'll be splitting logic into functions, even though this results in more code. Don't forget the last line where  main  is called, otherwise nothing will happen. If your script goes weird, use bash -x ./backup.sh for debugging. Also, you can use  man bash as a language reference.

Automating your back-up

After modifying the example above to your needs, you'll have a very simple back-up solution. Personally, I don't have the discipline to manually run it every day, so let's automate it.

Schedule your back-up using Crontab (if your computer/server is always on)

First, ensure you've saved your back-up script on a location that works for you, for this example we'll use /home/user/backup.sh . If you've got root access, you can put your back-up script in the system wide crontab using nano /etc/crontab . Otherwise you can modify your personal crontab using crontab -e . Then add the following line:

1 1  * * *     user     bash /home/user/backup.sh

Above example runs every day at 01:01. To change the timing, use man 5 crontab for more details on the cron-table format.

Optionally can put your back-up script in the /etc/cron.hourly/etc/cron.daily , /etc/cron.weekly  or /etc/cron.monthly folders instead of using crontab files. Ensure the script file is executable and has no special characters (use man run-parts for filename restrictions), for example:

mv /home/user/backup.sh /etc/cron.daily/backup
chmod +x /etc/cron.daily/backup
chown $USER /etc/cron.daily/backup

Schedule your back-up on an Interval (optionally, if your computer is not always on)

Using the Crontab scheduler as described above has one big downside, because if your computer is powered off at the scheduled time, your back-up doesn't run. Using a systemd-timer instead of Crontabs, you can set it to run when a schedule is missed or even when your computer boots, but some Linux distributions don't have systemd or are strictly against systemd.

By adding a bit of extra logic to our existing backup script, we can check when it last executed. For my laptop for example, I use something like the example below, with an interval of 8 hours (the interval is defined in seconds: 3600 * 8 = 28800 ) or as soon as it missed a schedule:

#!/bin/bash

readonly BACKUP_LOCAL_DIR="/home/$USER/backup/"
readonly BACKUP_DIRS=(/etc /home/$USER /root /var/www)

readonly RSYNC_PROFILE="user@backup.perfacilis.com::profile"
readonly RSYNC_DEFAULTS="-trlvz --delete --delete-excluded --prune-empty-dirs"

readonly INTERVAL=28800

prepare_local_dir() {
if [ ! -d $BACKUP_LOCAL_DIR ]; then
mkdir -p $BACKUP_LOCAL_DIR
touch -d "January 1 1990" $BACKUP_LOCAL_DIR
fi
}

check_interval() {
local LAST=$(stat -c %Y $BACKUP_LOCAL_DIR)
local NOW=$(date +%s)
local ELAPSED=$(($NOW - $LAST))

if [ "$ELAPSED" -lt "$INTERVAL" ]; then
  echo "Last backup was ${ELAPSED}s ago, which is less than ${INTERVAL}s."
  exit
fi
}

backup_folders() {
local DIR TARGET

for DIR in ${BACKUP_DIRS[@]}; do
TARGET=${DIR/#\//}
TARGET=${TARGET//\//_}
rsync $RSYNC_DEFAULTS $DIR/ $RSYNC_PROFILE/$TARGET
done
}

signoff_interval() {
touch $BACKUP_LOCAL_DIR
}

main() {
prepare_local_dir
check_interval
backup_folders
signoff_interval
}

main

The  prepare_local_dir creates a local back-up directory if it doesn't exist. The  check_interval  checks if the folder's modified time is longer ago than given interval. Finally I close with signoff_interval  to update the back-up folder's modified time again, so it's set-up for the next execution.

You do need to change your Crontab to ensure the back-up script runs more often, for example every hour:

1 *  * * *     user     bash /home/user/backup.sh

The check_interval method ensures the time between the actual rsync'ing is never less then given $INTERVAL .

Incremental back-ups with rsync

Looking trough Rsync's examples page, I found the basis of the incremental back-up script below. The idea is to save the latest version of all our files in a "current" directory and move older files into a separate incremental directory name. Again we'll have to add more bash logic:

#!/bin/bash

readonly BACKUP_LOCAL_DIR="/home/$USER/backup"
readonly BACKUP_DIRS=(/etc /home/$USER /root /var/www)

readonly RSYNC_PROFILE="user@backup.perfacilis.com::profile"
readonly RSYNC_DEFAULTS="-trlvz --delete --delete-excluded --prune-empty-dirs"

readonly INCREMENTS=7

prepare_local_dir() {
if [ ! -d $BACKUP_LOCAL_DIR ]; then
mkdir -p $BACKUP_LOCAL_DIR
touch -d "January 1 1990" $BACKUP_LOCAL_DIR
fi
}

prepare_remote_dir() {
local EMPTYDIR=$(mktemp -d)

rsync ${RSYNC_DEFAULTS//--delete* /} $EMPTYDIR/ $RSYNC_PROFILE/current
rm -rf $EMPTYDIR
}

get_next_increment() {
local LAST NEXT

if [ -f $BACKUP_LOCAL_DIR/last ]; then
LAST=$(cat $BACKUP_LOCAL_DIR/last | tr -d "\n")
fi

if [ -z "$LAST" ]; then
echo 0
return
fi

NEXT=$(($LAST+1))
if [ "$NEXT" -gt "$INCREMENTS" ]; then
echo 0
return
fi

echo $NEXT
}

backup_folders() {
local DIR TARGET RSYNC
local INC=$(get_next_increment)

for DIR in ${BACKUP_DIRS[@]}; do
TARGET=${DIR/#\//}
TARGET=${TARGET//\//_}

RSYNC="rsync $RSYNC_DEFAULTS"
if [ "$INC" -gt 0 ]; then
RSYNC="rsync $RSYNC_DEFAULTS --backup --backup-dir=/$INC/$TARGET"
fi

$RSYNC $DIR/ $RSYNC_PROFILE/current/$TARGET
done
}

signoff_increment() {
echo $(get_next_increment) > $BACKUP_LOCAL_DIR/last
}

main() {
prepare_local_dir
prepare_remote_dir

backup_folders

signoff_increment
}

main

First of all, prepare_current_dir ensures the "current" directory exists in the rsync target. Then, in backup_folders the get_next_increment is called, which will return a number between 0 and $INCREMENTS . For 0 , it will create a full backup of the given $DIR , a number greater than 0 will save the previous version into the $INC folder and updates the files in the "current" directory to reflect the latest version.

For local back-ups (e.g. an external hdd), we could easily list existing increments in the target to determine the next increment. Too bad we can't list the contents of remote rsync targets, therefore we have to keep track of the last target locally, which is done with the "last" file. The  signoff_increment  function keeps this file up to date. If the amount of $INCREMENTS is reached, it'll re-create a new full backup into "current", thus rotating/resetting it all. This method is safe to use with both remote and local targets.

You're free to change the amount of increments any point in time. Increasing the amount will create more numeric increment folders. Lowering the number won't delete higher numbers though, it simply skips them.

The missing peaces

Rsync profile password

When using a remote rsync profile, it usually needs authentication. Instead of using the RSYNC_PASSWORD variable, it's safer to set a password file:

readonly RSYNC_SECRET='u53Y0ur0wnPa55w0rdPlz'
readonly RSYNC_DEFAULTS="-trlqz4 --delete --delete-excluded --prune-empty-dirs"

get_rsync_opts() {
local SECRET=`dirname $0`/rsync.secret

if [ ! -f $SECRET ]; then
echo $RSYNC_SECRET > $SECRET
chmod 600 $SECRET
fi

echo "$RSYNC_DEFAULTS --password-file=$SECRET"
}

backup_folders() {
local RSYNC_OPTS=$(get_rsync_opts)

rsync $RSYNC_OPTS /home/$user /mount/externaldisk/backup/
}

cleanup() {
rm -f `dirname $0`/rsync.exclude
rm -f `dirname $0`/rsync.secret
}

main() {
trap "cleanup" EXIT

backup_folders
}

Where we've been using the  $RSYNC_DEFAULTS  until now, we need to be using the  $(get_rsync_opts) instead. See the simplified example above or the final script below. Also, a trap is added to the main function to ensure rsync.exclude and rsync.secret are removed, even if the script is killed.

Mysql backup

If you're using Mysql, you can add this simple backup_mysql function to create gzip files of your database. Don't forget to call it in the main function though.

readonly MYSQL="mysql --defaults-file=/etc/mysql/debian.cnf"
readonly MYSQLDUMP="mysqldump --defaults-file=/etc/mysql/debian.cnf --events --routines --max-allowed-packet=512MB --quick --quote-names --skip-comments"

backup_mysql() {
local DB

for DB in `$MYSQL -e 'show databases' | grep -v 'Database'`; do
if [ $DB = 'information_schema' -o $DB = 'performance_schema' ]; then
continue
fi

$MYSQLDUMP $DB | gzip > $BACKUP_LOCAL_DIR/$DB.sql.gz
done
}

Logging

You usually want cronjobs to remain silent, but you also want to be able to lookup what your back-up script actually did or is doing. Therefore a simple log function can be added to forward output to the log files. It will only output if you manually run the script from an active terminal.

function log() {
local MSG=`echo $1`
logger -p local0.notice -t `basename $0` -- $MSG

if tty -s; then
echo $MSG
fi
}

log "Back-up initiated at $(date)"

Keep in mind not to call log in get_next_increment , get_rsync_opts , because they rely on echo to work properly.

The Final Script…

Can either be found at our blogpost about our Improved Rsync Backup Script, our you can find the latest version at GitHub.

Conclusion

Even though our resulting script is a bit more intriguing than other existing tools may be, the result is a fool-proof incremental back-up that you can exactly change to your liking. It's capable of pushing your back-ups to both local and remote targets and you can change the increments whenever you want.

Compared to the old situation, where we needed seven times the space of the source system, we can now store one full and between 20 and 28 increments in the same space, that's almost a months worth! In other words, the improvement is substantial. Nonetheless, we would like to improve the script to support setting an amount of daily, weekly and monthly back-ups, to allow for a longer retention period. We're currently researching the possibilities and will post on that as soon as we've found a stable and proven solution.

A note on Duplicity

If you don't like to script so much, Duplicity is probably a good replacement for you. It supports encryption (which is a big pro), sending files over many protocols and can be set up in a single line of code. Don't want to script at all? Try Déjà Dup, this graphical interface for Duplicity makes setting up your back-up a breeze.

I prefer the manual script though. It's easy to change and I know exactly where to find what file, since the result is a simple flat file format. Duplicity creates multiple archives for each snapshot, which need to be searched trough using duplicity's tools. For me, that's too much of a hassle.

Perfacilis Back-up Service

The Perfacilis Back-up service alerts you when a back-up didn't finish in the scheduled time-frame, or if a lot of data changes at once (which might indicate an encrypter virus). You only pay for the amount of space your back-ups use.

Want to learn more?

Contact us

Sources

Changelog

2022-06-29
Removed the complete script and added reference to our latest Blogpost or GitHub.
2020-06-22
Added --skip-comments option to $MYSQLDUMP  to get rid of Dump completed on … comment. This avoids a dump to be saved remotely, if the data itself hasn't changed.
2020-07-03
Replaced ${INC/0/current} with ${INC/#0/current} , to avoid numbers ending in "0" to be replaced.
Silenced rsync's file has vanished erorrs, using example from Benoit Jacquemont