Backing Up With rsync And Managing Previous Versions/History

When using backup software, most of them use the versatile tool rsync. With this tool it's very easy to sync files and directories on the local or a remote host, and thus creating a copy. But most of them do not manage the history of changed and deleted data. Deleted files are also deleted in the backupcopy, changes are simply overwritten. This howto describes how to keep track of these changed and deleted files.

A good rsync command is:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices %DirToBackup% %BackupTargetDir%

where %DirToBackup% is the directory to backup, for example a home directory, /home/joe.
And %BackupTargetDir% is the directory where this directory is copied to, for example /srv/backupsimple/backup/localhost

Note that this command will create the directories home/joe in the target (because of the --relative) option.

Now this command is ok to make a copy, but a real backup is something else. To analyse the backup, there is an option for rsync which is very handy: --dry-run. This will perform the rsync, but rsync will not perform any realaction. In combination with the options --itemize-changes and --out-format this will give you a detailed logreport of the actions that will be taking (deleting, overwriting or creating).

For example, if there is no backup yet of the example directory of above, /home/joe in /srv/backupsimple/backup/localhost, and the contents of /home/joe looks like:

/home/joe/DocumentA
          DocumentB
          DocumentC
          DocumentD

then the output of the command

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

is:

.d..t......|/home/|
cd+++++++++|/home/joe/|
>f+++++++++|/home/joe/DocumentA|
>f+++++++++|/home/joe/DocumentB|
>f+++++++++|/home/joe/DocumentC|
>f+++++++++|/home/joe/DocumentD|

Analyzing this:
- the directory /home is changed, the directory time, cause the home directory already exists in the backupdirectory (I've done a backup of my own home earlier: /home/sbon) and the dir joe is created later.
- the directory /home/joe is created: therefore the first c. Second the d: it's a dir.
- the following files are created: note the starting >f.

Doing the realbackup:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

So now there is copy (or snapshot as you like) in /srv/backupsimple/backup/localhost.

Now adding new file is not the point, but changing existing file and/or removing them. Starting with changing files. Changing one of them:

echo "new contents" >> /home/joe/DocumentA

The dry run rsync command gives:

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

>f.st......|/home/joe/DocumentA|

Analyzing this:
- as you can see and probably expect that DocumentA is the only file that will be transfered.
Note the s and the t, the size is changed and the access/change time.

So, before doing a real backup, the file DocumentA should be backed up first.
To do so, create a timestamp:

timestamp=$(date "+%Y-%m-%d %H:%M:%S")

This looks like:

2010-04-18 20:55:08

Now create the "history" tree:

install --directory "/srv/backupsimple/history/localhost/$timestamp"
install --directory /srv/backupsimple/log/localhost/

Note the quotes, they are necessary cause of the space in the timestamp. Write the files to copy to a date based history tree:

echo "/home/joe/DocumentA" > /srv/backupsimple/log/localhost/$timestamp.changed

The rsync command:

rsync --relative --update --perms --owner --group --times --links --super --files-from="/srv/backupsimple/log/localhost/%timestamp.changed" /srv/backupsimple/backup/localhost /srv/backupsimple/history/localhost/$timestamp

This will make a backup of the DocumentA file, so now it's safe to run the original rsync command. The file which will be overwritten is copied to a safe place, where it's possible to be looked up later.

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

So now we have a snapshot of /home/joe, updated at 18 april 2010, at 20:55:08, and a earlier version of /home/joe/DocumentA.

With deleted files this is similar:

rm /home/joe/DocumentD

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

.d..t......|/home/joe/|
*deleting |home/joe/DocumentD|

Analyzing this output:
- the directory times of /home/joe are changed, which is always the case when a file is removed.
- and of course the file DocumentD is reported as deleted.

Create first a new timestamp:

timestamp=$(date "+%Y-%m-%d %H:%M:%S")
echo $timestamp

2010-04-18 20:56:30

Create the history dir:

install --directory "/srv/backupsimple/history/localhost/$timestamp"

echo "/home/joe/DocumentD" > /srv/backupsimple/log/localhost/$timestamp.deleted

The rsync command to backup the backup is:

rsync --relative --update --perms --owner --group --times --links --super --files-from="/srv/backupsimple/log/localhost/%timestamp.deleted" /srv/backupsimple/backup/localhost /srv/backupsimple/history/localhost/$timestamp

And again after this command the real rsync command:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

 

Generalized approach

When writing a script which does the things described above, things have to be generalized.

First set some variables:

DirToBackup=/home/joe
timestamp=$(date "+%Y-%m-%d %H:%M:%S")

install --directory "/srv/backupsimple/history/localhost/$timestamp"
install --directory /srv/backupsimple/backup/localhost/
install --directory /srv/backupsimple/log/localhost/

Do the dry run and write the output to a file:

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices $DirToBackup /srv/backupsimple/backup/localhost | sed '/^ *$/d' > "/srv/backupsimple/log/localhost/$timestamp.dryrun"

Note: the sed command deletes empty lines.

Now when you look at the format of the dryrun file, the created, deleted and changed items are:

Created and changed files:

grep "^.f" "/srv/backupsimple/log/localhost/$timestamp.dryrun" >> "/srv/backupsimple/log/localhost/$timestamp.onlyfiles"

grep "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" | awk -F '|' '{print $2 }' | sed 's@^/@@' >> "/srv/backupsimple/log/localhost/$timestamp.created"

grep --invert-match "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" | awk -F '|' '{print $2 }' | sed 's@^/@@" >> "/srv/backupsimple/log/localhost/$timestamp.changed"

Some notes:
- the various sed commands are necessary to remove the starting slash to make them relative and not absolute
- the dot in the grep command (^.f) is here a regexp expression and should not be taken literally

Created and changed directories:

grep "^\.d" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' | sed -e 's@^/@@' -e 's@/$@@' >> "/srv/backupsimple/log/localhost/$timestamp.changed"

grep "^cd" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' | sed -e 's@^/@@' -e 's@/$@@' >> "/srv/backupsimple/log/localhost/$timestamp.created"

Some notes:
- the various sed commands are necessary to remove the starting slash and the slash at the end of the path, again to make them relative and prevent "recursive" behaviour, rsync is sensitive to that
- the dot in the grep command (^\.d) should be taken literally

Deleted files and directories:

grep "^*deleting" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' >> "/srv/backupsimple/log/localhost/$timestamp.deleted"

Notes:
- the paths do not start with a slash, so removing them is not necessary
- a trailing slash is harmless here: deleting a dir means always recursive

So now there are the files $timestamp.created, $timestamp.changed and $timestamp.deleted.

The file with created items is only here for logging. You cannot and do not have to backup files which are not created yet!

Cat the changed and the deleted items together:

cat "/srv/backupsimple/log/localhost/$timestamp.deleted" > /tmp/tmp.rsync.list
cat "/srv/backupsimple/log/localhost/$timestamp.changed" >> /tmp/tmp.rsync.list
sort --output=/tmp/rsync.list --unique /tmp/tmp.rsync.list

Now do the backup of the backup:

rsync --relative --update --perms --owner --group --times --links --super --files-from=/tmp/rsync.list /srv/backupsimple/backup/localhost/ "/srv/backupsimple/history/localhost/$timestamp"

Finally do the real backup:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices $DirToBackup /srv/backupsimple/backup/localhost

One note, I've copied these commands from a script. There might be some errors, but the idea is clear I hope.

Local and remote

Above is described howto do a backup locally, but it's also very possible to backup to a remote host running a rsync deamon. It requires a more complicated configuration. Not doing the dryrun and the realbackup, they are simple, but it's the step of backing up the backup. The various files with created, changed and deleted items are on the localhost, while this step should be performed on the remote host.

There are various ways to solve this. One of them is mounting the remote host with sshfs, and the localhost can do the backup as if it's acting local.

A better (imho) sollution is creating an apart "queue" share on the rsync server (besides the backup and the history shares) where the file with the items to be backed up from the backup should be synced to. The rsync server has te ability to run pre and post scripts. When the localhost tries to do the realbackup, a pre script should check there is list there in the queue which should be processed first. If so, it will do this step first. The rsync command on the localhost just will wait till the pre fase is finished.

 

Share this page:

14 Comment(s)