We've been making back-ups for others for quite some years now. Before, we were using full back-ups, creating a simple copy of the entire source system for every day of the week. This meant the size of the back-up was seven times the size of the source system. To improve on cost and efficiency, we switched over to incremental back-ups.
This article shows how we found a solution to create incremental back-ups over rsync to a remote location. If you're not interested in the recreation, just copy The Final Script, modify it to your needs and off you go.
This page is an explanation only
This page explains in detail how an incremental backup can be made using rsync. It relies on an idea that has been changed and improved a lot.
If you don't want to learn how it works and just want a working script:
Improved Rsync Incremental Backup script
Basic full back-ups
If you have any experience with Rsync, you might know it's amazingly easy to copy your important files to a back-up location. As the simple example below shows, the logic doesn't really change much if you're using an external disk, a remote server running rsyncd, or running rsync over ssh:
# Rsync to external disk
rsync -av /home/$USER/Doduments /mount/externaldisk/backup
# Rsync to server running rsyncd
rsync -avz /home/$USER/Documents username@backup.perfacilis.com::profilen/Documents
# Rsync to ssh server
rsync -avz /home/$USER/Documents username@ssh.server.com:/backup/Documents
This is as basic as it gets and even though it gets the job done, it's far from a robust solution.
Backing-up multiple folders
Using the magic of a bash script, you can set up an array for the folders you want to back-up and loop trough them:
#!/bin/bash
readonly BACKUP_DIRS=(/etc /home/$USER /root /var/www)
readonly RSYNC_PROFILE="user@backup.perfacilis.com::profile"
readonly RSYNC_DEFAULTS="-avz"
backup_folders() {
local DIR TARGET
for DIR in ${BACKUP_DIRS[@]}; do
TARGET=${DIR/#\//}
TARGET=${TARGET//\//_}
rsync $RSYNC_DEFAULTS $DIR/ $RSYNC_PROFILE/$TARGET
done
}
main() {
backup_folders
}
main
Modify the BACKUP_DIRS
array to your needs, simply separate each folder with a space, save the text as backup.sh
, and simply run it from your command line: bash backup.sh
and your back-up will run. Optionally you can make the file executable running chmod +x backup.sh
, this you can run it, without calling bash: ./backup.sh
.
A note on Bash code clarity
To keep the script clean, I'll be splitting logic into functions, even though this results in more code. Don't forget the last line where main
is called, otherwise nothing will happen. If your script goes weird, use bash -x ./backup.sh
for debugging. Also, you can use man bash
as a language reference.
Automating your back-up
After modifying the example above to your needs, you'll have a very simple back-up solution. Personally, I don't have the discipline to manually run it every day, so let's automate it.
Schedule your back-up using Crontab (if your computer/server is always on)
First, ensure you've saved your back-up script on a location that works for you, for this example we'll use /home/user/backup.sh
. If you've got root access, you can put your back-up script in the system wide crontab using nano /etc/crontab
. Otherwise you can modify your personal crontab using crontab -e
. Then add the following line:
1 1 * * * user bash /home/user/backup.sh
Above example runs every day at 01:01. To change the timing, use man 5 crontab
for more details on the cron-table format.
Optionally can put your back-up script in the /etc/cron.hourly
, /etc/cron.daily
, /etc/cron.weekly
or /etc/cron.monthly
folders instead of using crontab files. Ensure the script file is executable and has no special characters (use man run-parts
for filename restrictions), for example:
mv /home/user/backup.sh /etc/cron.daily/backup
chmod +x /etc/cron.daily/backup
chown $USER /etc/cron.daily/backup
Schedule your back-up on an Interval (optionally, if your computer is not always on)
Using the Crontab scheduler as described above has one big downside, because if your computer is powered off at the scheduled time, your back-up doesn't run. Using a systemd-timer instead of Crontabs, you can set it to run when a schedule is missed or even when your computer boots, but some Linux distributions don't have systemd or are strictly against systemd.
By adding a bit of extra logic to our existing backup script, we can check when it last executed. For my laptop for example, I use something like the example below, with an interval of 8 hours (the interval is defined in seconds: 3600 * 8 = 28800
) or as soon as it missed a schedule:
#!/bin/bash
readonly BACKUP_LOCAL_DIR="/home/$USER/backup/"
readonly BACKUP_DIRS=(/etc /home/$USER /root /var/www)
readonly RSYNC_PROFILE="user@backup.perfacilis.com::profile"
readonly RSYNC_DEFAULTS="-trlvz --delete --delete-excluded --prune-empty-dirs"
readonly INTERVAL=28800
prepare_local_dir() {
if [ ! -d $BACKUP_LOCAL_DIR ]; then
mkdir -p $BACKUP_LOCAL_DIR
touch -d "January 1 1990" $BACKUP_LOCAL_DIR
fi
}
check_interval() {
local LAST=$(stat -c %Y $BACKUP_LOCAL_DIR)
local NOW=$(date +%s)
local ELAPSED=$(($NOW - $LAST))
if [ "$ELAPSED" -lt "$INTERVAL" ]; then
echo "Last backup was ${ELAPSED}s ago, which is less than ${INTERVAL}s."
exit
fi
}
backup_folders() {
local DIR TARGET
for DIR in ${BACKUP_DIRS[@]}; do
TARGET=${DIR/#\//}
TARGET=${TARGET//\//_}
rsync $RSYNC_DEFAULTS $DIR/ $RSYNC_PROFILE/$TARGET
done
}
signoff_interval() {
touch $BACKUP_LOCAL_DIR
}
main() {
prepare_local_dir
check_interval
backup_folders
signoff_interval
}
main
The prepare_local_dir
creates a local back-up directory if it doesn't exist. The check_interval
checks if the folder's modified time is longer ago than given interval. Finally I close with signoff_interval
to update the back-up folder's modified time again, so it's set-up for the next execution.
You do need to change your Crontab to ensure the back-up script runs more often, for example every hour:
1 * * * * user bash /home/user/backup.sh
The check_interval
method ensures the time between the actual rsync'ing is never less then given $INTERVAL
.
Incremental back-ups with rsync
Looking trough Rsync's examples page, I found the basis of the incremental back-up script below. The idea is to save the latest version of all our files in a "current" directory and move older files into a separate incremental directory name. Again we'll have to add more bash logic:
#!/bin/bash
readonly BACKUP_LOCAL_DIR="/home/$USER/backup"
readonly BACKUP_DIRS=(/etc /home/$USER /root /var/www)
readonly RSYNC_PROFILE="user@backup.perfacilis.com::profile"
readonly RSYNC_DEFAULTS="-trlvz --delete --delete-excluded --prune-empty-dirs"
readonly INCREMENTS=7
prepare_local_dir() {
if [ ! -d $BACKUP_LOCAL_DIR ]; then
mkdir -p $BACKUP_LOCAL_DIR
touch -d "January 1 1990" $BACKUP_LOCAL_DIR
fi
}
prepare_remote_dir() {
local EMPTYDIR=$(mktemp -d)
rsync ${RSYNC_DEFAULTS//--delete* /} $EMPTYDIR/ $RSYNC_PROFILE/current
rm -rf $EMPTYDIR
}
get_next_increment() {
local LAST NEXT
if [ -f $BACKUP_LOCAL_DIR/last ]; then
LAST=$(cat $BACKUP_LOCAL_DIR/last | tr -d "\n")
fi
if [ -z "$LAST" ]; then
echo 0
return
fi
NEXT=$(($LAST+1))
if [ "$NEXT" -gt "$INCREMENTS" ]; then
echo 0
return
fi
echo $NEXT
}
backup_folders() {
local DIR TARGET RSYNC
local INC=$(get_next_increment)
for DIR in ${BACKUP_DIRS[@]}; do
TARGET=${DIR/#\//}
TARGET=${TARGET//\//_}
RSYNC="rsync $RSYNC_DEFAULTS"
if [ "$INC" -gt 0 ]; then
RSYNC="rsync $RSYNC_DEFAULTS --backup --backup-dir=/$INC/$TARGET"
fi
$RSYNC $DIR/ $RSYNC_PROFILE/current/$TARGET
done
}
signoff_increment() {
echo $(get_next_increment) > $BACKUP_LOCAL_DIR/last
}
main() {
prepare_local_dir
prepare_remote_dir
backup_folders
signoff_increment
}
main
First of all, prepare_current_dir
ensures the "current" directory exists in the rsync target. Then, in backup_folders
the get_next_increment
is called, which will return a number between 0
and $INCREMENTS
. For 0
, it will create a full backup of the given $DIR
, a number greater than 0
will save the previous version into the $INC
folder and updates the files in the "current" directory to reflect the latest version.
For local back-ups (e.g. an external hdd), we could easily list existing increments in the target to determine the next increment. Too bad we can't list the contents of remote rsync targets, therefore we have to keep track of the last target locally, which is done with the "last" file. The signoff_increment
function keeps this file up to date. If the amount of $INCREMENTS
is reached, it'll re-create a new full backup into "current", thus rotating/resetting it all. This method is safe to use with both remote and local targets.
You're free to change the amount of increments any point in time. Increasing the amount will create more numeric increment folders. Lowering the number won't delete higher numbers though, it simply skips them.
The missing peaces
Rsync profile password
When using a remote rsync profile, it usually needs authentication. Instead of using the RSYNC_PASSWORD
variable, it's safer to set a password file:
readonly RSYNC_SECRET='u53Y0ur0wnPa55w0rdPlz'
readonly RSYNC_DEFAULTS="-trlqz4 --delete --delete-excluded --prune-empty-dirs"
get_rsync_opts() {
local SECRET=`dirname $0`/rsync.secret
if [ ! -f $SECRET ]; then
echo $RSYNC_SECRET > $SECRET
chmod 600 $SECRET
fi
echo "$RSYNC_DEFAULTS --password-file=$SECRET"
}
backup_folders() {
local RSYNC_OPTS=$(get_rsync_opts)
rsync $RSYNC_OPTS /home/$user /mount/externaldisk/backup/
}
cleanup() {
rm -f `dirname $0`/rsync.exclude
rm -f `dirname $0`/rsync.secret
}
main() {
trap "cleanup" EXIT
backup_folders
}
Where we've been using the $RSYNC_DEFAULTS
until now, we need to be using the $(get_rsync_opts)
instead. See the simplified example above or the final script below. Also, a trap
is added to the main
function to ensure rsync.exclude
and rsync.secret
are removed, even if the script is killed.
Mysql backup
If you're using Mysql, you can add this simple backup_mysql
function to create gzip files of your database. Don't forget to call it in the main
function though.
readonly MYSQL="mysql --defaults-file=/etc/mysql/debian.cnf"
readonly MYSQLDUMP="mysqldump --defaults-file=/etc/mysql/debian.cnf --events --routines --max-allowed-packet=512MB --quick --quote-names --skip-comments"
backup_mysql() {
local DB
for DB in `$MYSQL -e 'show databases' | grep -v 'Database'`; do
if [ $DB = 'information_schema' -o $DB = 'performance_schema' ]; then
continue
fi
$MYSQLDUMP $DB | gzip > $BACKUP_LOCAL_DIR/$DB.sql.gz
done
}
Logging
You usually want cronjobs to remain silent, but you also want to be able to lookup what your back-up script actually did or is doing. Therefore a simple log
function can be added to forward output to the log files. It will only output if you manually run the script from an active terminal.
function log() {
local MSG=`echo $1`
logger -p local0.notice -t `basename $0` -- $MSG
if tty -s; then
echo $MSG
fi
}
log "Back-up initiated at $(date)"
Keep in mind not to call log
in get_next_increment
, get_rsync_opts
, because they rely on echo to work properly.
The Final Script…
Can either be found at our blogpost about our Improved Rsync Backup Script, our you can find the latest version at GitHub.
Conclusion
Even though our resulting script is a bit more intriguing than other existing tools may be, the result is a fool-proof incremental back-up that you can exactly change to your liking. It's capable of pushing your back-ups to both local and remote targets and you can change the increments whenever you want.
Compared to the old situation, where we needed seven times the space of the source system, we can now store one full and between 20 and 28 increments in the same space, that's almost a months worth! In other words, the improvement is substantial. Nonetheless, we would like to improve the script to support setting an amount of daily, weekly and monthly back-ups, to allow for a longer retention period. We're currently researching the possibilities and will post on that as soon as we've found a stable and proven solution.
A note on Duplicity
If you don't like to script so much, Duplicity is probably a good replacement for you. It supports encryption (which is a big pro), sending files over many protocols and can be set up in a single line of code. Don't want to script at all? Try Déjà Dup, this graphical interface for Duplicity makes setting up your back-up a breeze.
I prefer the manual script though. It's easy to change and I know exactly where to find what file, since the result is a simple flat file format. Duplicity creates multiple archives for each snapshot, which need to be searched trough using duplicity's tools. For me, that's too much of a hassle.
Perfacilis Back-up Service
The Perfacilis Back-up service alerts you when a back-up didn't finish in the scheduled time-frame, or if a lot of data changes at once (which might indicate an encrypter virus). You only pay for the amount of space your back-ups use.
Want to learn more?
Sources
- Defensive BASH programming
- Easy Automated Snapshot-Style Backups with Linux and Rsync
- Shell Parameter Expansion
Changelog
- 2022-06-29
- Removed the complete script and added reference to our latest Blogpost or GitHub.
- 2020-06-22
- Added
--skip-comments
option to$MYSQLDUMP
to get rid ofDump completed on …
comment. This avoids a dump to be saved remotely, if the data itself hasn't changed. - 2020-07-03
- Replaced
${INC/0/current}
with${INC/#0/current}
, to avoid numbers ending in "0" to be replaced. - Silenced rsync's
file has vanished
erorrs, using example from Benoit Jacquemont