Symbolically linking directories during incremental file backups

current_hardlinking_schema

Current hard linking incremental file backup schema

Currently files which are unchanged between incremental file backups are hard linked. This means there exist more than one file paths for the same data. The advantages are

  • The operating systems takes care of managing the data. If there are no file entries pointing at the data it deletes it. Removing a backup means removing a folder.
  • Every file backup looks like a full backup on the backup server. This makes restoring and browsing file backups really easy.

But there are some issues in some cases:

  • On some filesystems and/or slow hard disks the hard link operations are slow (NTFS) and are the bottleneck during backups.
  • Some filesystems only have a limited amount of available file entries (ext e.g.).

With a lot of unchanged files between incremental backups it is almost certain the hard linking operation is the bottleneck.

To improve this situation UrBackup server 1.4 will create symlinks to large directories which are completely unchanged during incremental backups. This avoids a lot of the hard linking operations.

symlinking_schema

Symlinking large unchanged directories to a pooled directory

If, during incremental backups, a large unchanged directory is detected it is moved into a client specific directory pool. Then a symbolic link in the current and last backup to the directory in the pool is created. If the last backup already contains a link to the pool a symlink in the current backup is created to the pooled directory. Because the operating system does not count how many symbolic links point to a directory, the number of symlinks per pooled directory is saved into UrBackups internal database. If a backup is deleted and has symlinks the relevant entries in the database are removed. If there are no entries for the directory in the database the pooled directory is removed.

There are some caveats:

  • There are now symbolic links in the backups. Some tools may not be able to handle them, or need to be told how to handle them.
  • If you rename the backup folder all symbolic links are invalidated and won’t work anymore. The UrBackup server has a tool to fix that, but it will be slow.
  • The reference counting of symbolic links has some tricky corner cases and needs some testing.
  • You cannot just delete file backups anymore as then pooled directories may not be deleted. The UrBackup server includes a tool to detect that, but this might be slow with an increasing amount of directories in the pool.

The symbolic linking of large unchanged directories will be the default behavior in UrBackup server 1.4, though you will be able to deactivate it in the advanced settings. UrBackup server will not symlink directories on btrfs, as there the snapshotting provides a superior method to do fast backups.

Comments should work now

I moved the blog from Sourceforge to a different server, because the spam detection plugins of WordPress do not work on the Sourceforge webspace. The comments should work now. Sorry for the inconvenience.