Full backups versus Incremental backups
A Full Backup is simply one-to-one copy of all files of the source to the target, also a mirror. For example, if you want to backup your home folder of 100G for 7 days of the week, your backup medium would need 700G of free space. An Incremental Backup or Differential Backup starts with one full backup and then creates a version history where only changes in your files are stored. This results in a significant decrease in the amount of disk space needed.
For reference only, superseded by our latest incremental rsync backup script
This page explains in detail how an incremental backup can be made using rsync. It relies on an idea that has been changed and improved a lot.
If you don't want to learn how it works and just want a working script:
Rsync full back-ups
If you have any experience with Rsync, you might know it's amazingly easy to copy your important files to a back-up location. As the simple example below shows, the logic doesn't really change much if you're using an external disk, a remote server running rsyncd, or running rsync over ssh:
This is as basic as it gets and even though it gets the job done, it's far from a robust solution.
Backing-up multiple folders
Using the magic of a bash script, you can set up an array for the folders you want to back-up and loop trough them:
Modify the BACKUP_DIRS
array to your needs, simply separate each folder with a space, save the text as backup.sh
, and simply run it from your command line: bash backup.sh
and your back-up will run. Optionally you can make the file executable running chmod +x backup.sh
, this you can run it, without calling bash: ./backup.sh
.
A note on Bash code clarity
To keep the script clean, I'll be splitting logic into functions, even though this results in more code. Don't forget the last line where main
is called, otherwise nothing will happen. If your script goes weird, use bash -x ./backup.sh
for debugging. Also, you can use man bash
as a language reference.
Automating your back-up
After modifying the example above to your needs, you'll have a very simple back-up solution. Personally, I don't have the discipline to manually run it every day, so let's automate it.
Schedule your back-up using Crontab (if your computer/server is always on)
First, ensure you've saved your back-up script on a location that works for you, for this example we'll use /home/user/backup.sh
. If you've got root access, you can put your back-up script in the system wide crontab using nano /etc/crontab
. Otherwise you can modify your personal crontab using crontab -e
. Then add the following line:
Above example runs every day at 01:01. To change the timing, use man 5 crontab
for more details on the cron-table format.
Optionally can put your back-up script in the /etc/cron.hourly
, /etc/cron.daily
, /etc/cron.weekly
or /etc/cron.monthly
folders instead of using crontab files. Ensure the script file is executable and has no special characters (use man run-parts
for filename restrictions), for example:
Schedule your back-up on an Interval (optionally, if your computer is not always on)
Using the Crontab scheduler as described above has one big downside, because if your computer is powered off at the scheduled time, your back-up doesn't run. Using a systemd-timer instead of Crontabs, you can set it to run when a schedule is missed or even when your computer boots, but some Linux distributions don't have systemd or are strictly against systemd.
By adding a bit of extra logic to our existing backup script, we can check when it last executed. For my laptop for example, I use something like the example below, with an interval of 8 hours (the interval is defined in seconds: 3600 * 8 = 28800
) or as soon as it missed a schedule:
The prepare_local_dir
creates a local back-up directory if it doesn't exist. The check_interval
checks if the folder's modified time is longer ago than given interval. Finally I close with signoff_interval
to update the back-up folder's modified time again, so it's set-up for the next execution.
You do need to change your Crontab to ensure the back-up script runs more often, for example every hour:
1 * * * * user bash /home/user/backup.sh
The check_interval
method ensures the time between the actual rsync'ing is never less then given $INTERVAL
.
Incremental back-ups with rsync
Looking trough Rsync's examples page, I found the basis of the incremental back-up script below. The idea is to save the latest version of all our files in a "current" directory and move older files into a separate incremental directory name. Again we'll have to add more bash logic:
First of all, prepare_current_dir
ensures the "current" directory exists in the rsync target. Then, in backup_folders
the get_next_increment
is called, which will return a number between 0
and $INCREMENTS
. For 0
, it will create a full backup of the given $DIR
, a number greater than 0
will save the previous version into the $INC
folder and updates the files in the "current" directory to reflect the latest version.
For local back-ups (e.g. an external hdd), we could easily list existing increments in the target to determine the next increment. Too bad we can't list the contents of remote rsync targets, therefore we have to keep track of the last target locally, which is done with the "last" file. The signoff_increment
function keeps this file up to date. If the amount of $INCREMENTS
is reached, it'll re-create a new full backup into "current", thus rotating/resetting it all. This method is safe to use with both remote and local targets.
You're free to change the amount of increments any point in time. Increasing the amount will create more numeric increment folders. Lowering the number won't delete higher numbers though, it simply skips them.
The missing peaces
Rsync profile password
When using a remote rsync profile, it usually needs authentication. Instead of using the RSYNC_PASSWORD
variable, it's safer to set a password file:
Where we've been using the $RSYNC_DEFAULTS
until now, we need to be using the $(get_rsync_opts)
instead. See the simplified example above or the final script below. Also, a trap
is added to the main
function to ensure rsync.exclude
and rsync.secret
are removed, even if the script is killed.
Mysql backup
If you're using Mysql, you can add this simple backup_mysql
function to create gzip files of your database. Don't forget to call it in the main
function though.
Logging
You usually want cronjobs to remain silent, but you also want to be able to lookup what your back-up script actually did or is doing. Therefore a simple log
function can be added to forward output to the log files. It will only output if you manually run the script from an active terminal.
Keep in mind not to call log
in get_next_increment
, get_rsync_opts
, because they rely on echo to work properly.
The Final Script…
Can either be found at our blogpost about our Improved Rsync Backup Script, our you can find the latest version at GitHub.
Conclusion
Even though our resulting script is a bit more intriguing than other existing tools may be, the result is a fool-proof incremental back-up that you can exactly change to your liking. It's capable of pushing your back-ups to both local and remote targets and you can change the increments whenever you want.
Compared to the old situation, where we needed seven times the space of the source system, we can now store one full and between 20 and 28 increments in the same space, that's almost a months worth! In other words, the improvement is substantial. Nonetheless, we would like to improve the script to support setting an amount of daily, weekly and monthly back-ups, to allow for a longer retention period. We're currently researching the possibilities and will post on that as soon as we've found a stable and proven solution.
A note on Duplicity
If you don't like to script so much, Duplicity is probably a good replacement for you. It supports encryption (which is a big pro), sending files over many protocols and can be set up in a single line of code. Don't want to script at all? Try Déjà Dup, this graphical interface for Duplicity makes setting up your back-up a breeze.
I prefer the manual script though. It's easy to change and I know exactly where to find what file, since the result is a simple flat file format. Duplicity creates multiple archives for each snapshot, which need to be searched trough using duplicity's tools. For me, that's too much of a hassle.
Perfacilis Back-up Service
The Perfacilis Back-up service alerts you when a back-up didn't finish in the scheduled time-frame, or if a lot of data changes at once (which might indicate an encrypter virus). You only pay for the amount of space your back-ups use.
Want to learn more?
Sources
- Defensive BASH programming
- Easy Automated Snapshot-Style Backups with Linux and Rsync
- Shell Parameter Expansion
Changelog
- 2022-06-29
- Removed the complete script and added reference to our latest Blogpost or GitHub.
- 2020-06-22
- Added
--skip-comments
option to$MYSQLDUMP
to get rid ofDump completed on …
comment. This avoids a dump to be saved remotely, if the data itself hasn't changed. - 2020-07-03
- Replaced
${INC/0/current}
with${INC/#0/current}
, to avoid numbers ending in "0" to be replaced. - Silenced rsync's
file has vanished
erorrs, using example from Benoit Jacquemont