This is a guest post, written by Arun Kumar Sori.
About the Author:
Arun Kumar Sori is an Open Source Enthusiast, C++ lover, and an Engineer in making. Love to learn and share.
Performing Incremental backups using tar
We are all much familiar with “tar” command on linux. We mostly use it for archiving some files or getting files back from the already created archives.
For those who don’t know “tar” stands for tape archive, which is used by System administrators to take backup. The tar command used to rip a collection of files and directories into highly compressed archive file commonly called tarball or tar, gzip and bzip in Linux. The tar is most widely used command to create compressed archive files and that can be moved easily from one disk to another disk or machine to machine.
In this tutorial, I am going to show you the usage of a rarely used feature of the tar command known as Incremental Dumps.
First of all, let’s see some basic usage of tar:
1. Creating archives from tar
tar -cvf archive.tar /path/to/dir/
This example creates an archive archive.tar for files in directory /path/to/dir.
Details of the arguments are:
- c – creating archive
- v – verbose mode
- f – filename type for the archive
Note that this command creates a normal archive, not a compressed one. For compression use “z” for tar.gz and “j” for tar.bz2.
For example:
tar -cvzf archive.tar.gz /path/to/dir
This creates a tar.gz archive.
tar -cvjf archive.tar.bz2 /path/to/dir
This creates a tar.bz2 archive.
2. Uncompressing the archives
tar -xvf archive.tar
-
Here, “x” argument is given for extraction of the archive.
The same can be used for .tar.gz and .tar.bz2 archives too.
Now let’s move on to more advanced feature of tar which is topic of discussion of this tutorial.
Normally, If we have a large amount of data (which is common nowadays) stored on our devices backup can a long time to complete.
So Initially we would want to a full backup for the first time and then for all the next times we would want that only those files which are modified or added should get in the backup leaving behind the obsolete and unchanged files.
This feature is provided by “tar” by simply providing an argument “-listed-incremental=snapshot-file” where snapshot-file is a special file maintained by the tar command to determine the files that are been added,modified or deleted.
So let’s see an example :
tar --listed-incremental=snapshot.file -cvzf backup.tar.gz /path/to/dir
tar: .: Directory is new ./ ./1 ./bar ./foo ./snapshot.file
Let’s understand what’s happening with the above command.
Only the –listed-incremental argument is added more to usual creating archive command.
In the above command if the snapshot.file is not existing then tar takes a full (level-0) backup and creating the snapshot file with the additional metadata.
Otherwise, it will create an incremented archive backup.tar.gz containing only the changed files by examining the snapshot.file. This will be called “level-1” backup.
tar --listed-incremental=snapshot.file -cvzf backup.1.tar.gz /path/to/dir
./ ./backup.tar.gz ./foo
Note that, the original snapshot file will be lost and it will be updated to the new contents again.
So if we want to make more “level-1” backups we can copy the snapshot file and then provide it to tar. If we don’t need that then we need to do nothing it will simply created another incremented archive.
cp snapshot.file snapshot.file.1
tar --listed-incremental=snapshot.file.1 -cvzf backup.1.tar.gz /path/to/dir
This will use the old snapshot file and make again a “level-1” backup.
Of course, we can force tar to take a “level-0” backup by either removing the snapshot file or by giving “–level=0” argument to tar.
tar --listed-incremental=snapshot.file --level=0 -cvzf backup.2.tar.gz /path/to/dir
tar: .: Directory is new ./ ./1 ./bar ./foo ./snapshot.file
Note that, incremental dumps crucially on time-stamps. Any interference with them could cause trouble.
In the same way, we can extract the incremental backups.
For extracting from incremental archives we need to provide the –listed-incremental argument.
In this case, tar need no access snapshot file, as all the data necessary for extraction are stored in the archive itself. So, when extracting, we can give whatever argument to ‘–listed-incremental’, the usual practice is to use ‘–listed-incremental=/dev/null’.
When extracting from the incremental backup tar attempts to restore the exact state the file system had when the archive was created. In particular, it will delete those files in the file system that did not exist in their directories when the archive was created.
So if we had created several levels of incremental files, then in order to restore the exact contents the file system had when the last level was created, we will need to restore from all backups in turn.
At first, do level-0 extraction:
tar --listed-incremental=/dev/null -xvf backup.tar.gz
and then, level-1 extraction:
tar --listed-incremental=/dev/null -xvf backup.1.tar.gz
For more detailed explanation go to: http://www.gnu.org/software/tar/manual/html_node/Incremental-Dumps.html
Note that above applies for GNU version of tar.
Also, I’ve made shell script for making incremental backups and store them at remote locations (using ssh) here.
Explanations are given within the file. Any comments and improvements are most welcome.
Cheers!