Current hard linking incremental file backup schema
Currently files which are unchanged between incremental file backups are hard linked. This means there exist more than one file paths for the same data. The advantages are
- The operating systems takes care of managing the data. If there are no file entries pointing at the data it deletes it. Removing a backup means removing a folder.
- Every file backup looks like a full backup on the backup server. This makes restoring and browsing file backups really easy.
But there are some issues in some cases:
- On some filesystems and/or slow hard disks the hard link operations are slow (NTFS) and are the bottleneck during backups.
- Some filesystems only have a limited amount of available file entries (ext e.g.).
With a lot of unchanged files between incremental backups it is almost certain the hard linking operation is the bottleneck.
To improve this situation UrBackup server 1.4 will create symlinks to large directories which are completely unchanged during incremental backups. This avoids a lot of the hard linking operations.
Symlinking large unchanged directories to a pooled directory
If, during incremental backups, a large unchanged directory is detected it is moved into a client specific directory pool. Then a symbolic link in the current and last backup to the directory in the pool is created. If the last backup already contains a link to the pool a symlink in the current backup is created to the pooled directory. Because the operating system does not count how many symbolic links point to a directory, the number of symlinks per pooled directory is saved into UrBackups internal database. If a backup is deleted and has symlinks the relevant entries in the database are removed. If there are no entries for the directory in the database the pooled directory is removed.
There are some caveats:
- There are now symbolic links in the backups. Some tools may not be able to handle them, or need to be told how to handle them.
- If you rename the backup folder all symbolic links are invalidated and won’t work anymore. The UrBackup server has a tool to fix that, but it will be slow.
- The reference counting of symbolic links has some tricky corner cases and needs some testing.
- You cannot just delete file backups anymore as then pooled directories may not be deleted. The UrBackup server includes a tool to detect that, but this might be slow with an increasing amount of directories in the pool.
The symbolic linking of large unchanged directories will be the default behavior in UrBackup server 1.4, though you will be able to deactivate it in the advanced settings. UrBackup server will not symlink directories on btrfs, as there the snapshotting provides a superior method to do fast backups.
It has been a while and there are now a lot new features in UrBackup Server 1.3.
Users of the web interface can download a Client specific installer directly from the server now. The installer has the UrBackup Server information embedded, such that the client automatically connects to the server once it is installed. I’ve also published a script that connects to a UrBackup server, creates a client named like the local computer and then downloads and executes the client installer. This enables a one click setup experience for Internet clients.
The new live log lets you see what the UrBackup server is currently busy with. You can either see all debug level log messages or client specific log messages.
You can see which files the UrBackup Server is currently working on and the usual log messages which you can also view afterwards via web interface or on the client.
A “hidden” feature is now accessible via web interface: You can disable any type of backup for any client.
More fine grained permissions for the client allow you to prevent the users from starting full file backups, but still allow incremental file backups.
The soft client quota allows you to limit the amount of storage each client can use. During the nightly cleanup UrBackup deletes the client’s backups until the storage usage is within the bounds prescribed by the soft client quota. Other than a percentage value you can also use something like “20G” as soft client quota.
You can now have separate backup windows for incremental/full file/image backups.
Client-side file hashes prevent the re-transfer of files that are already on the server, e.g., because another client has the same file. In some situations this drastically reduces the bandwidth requirements and speeds up file backups over Internet.
If you have performance problems with file backups the new file entry cache may help you. If the file entry cache is enabled file entries (a mapping from file hash to file paths) are cached in a separate database which may speed up backups. The caches are automatically created and destroyed if this setting is changed (and the server restarted), but creation may take a long time. LMDB makes heavy use of memory mapped files and is therefore only advisable on a 64bit operating system. It does also create a very large sparse file on Windows. When in doubt use the SQLite cache.
Binaries of UrBackup for Raspian are available now here (Update: See the nighlty builds or the “wheezy” folder of Debian of the official releases). Performance isn’t that good on the raspberry pi, obviously. I’d avoid using any kind of compression, for example.
First setup the backup storage. I suggest using btrfs. Plug in a USB-hard disk and create a partition, e.g. using cfdisk. Then install btrfs and create the filesystem:
apt-get install btrfs-tools
Add an entry to /etc/fstab to automatically mount the USB-hard disk:
/dev/sda1 /media/backup btrfs defaults,nossd 0 0
Then mount the hard disk:
Download the UrBackup server package and install it:
dpkg -i urbackup-server_1.2-1_armhf.deb
Resolve the dependencies:
apt-get -f install
Enter e.g. /media/backup/urbackup when it asks for the backup storage path.
UrBackup server should run now and automatically backup clients as soon as you install the UrBackup client on one of your PCs in the local network (from here). If you want to change any settings you should go to the web interface. Let 192.168.0.42 be the IP of your raspberry pi. You can then access the web interface of UrBackup via typing http://192.168.0.42:55414 into your address bar. If you want to access your files you should share /media/backup/urbackup e.g. via samba. There are guides on how to install samba on a raspberry pi (here for example).
Sidenote: I’ve heard that it runs even better on a Cubieboard. It has a SATA-port so the USB-Bottlenet is gone. Unfortunately, they’re even harder to get than Raspberry Pis.
The Linux kernel 3.6 finally added a feature to btrfs which I needed to easily implement a snapshot file backup storage. This feature will be automatically enabled if the backup storage path points to a btrfs file system and btrfs supports cross-sub-volume reflinks.
A backup with snapshots works like this:
- Every file backup is put into a separate sub-volume, i.e., a file backup with a name like “121102-1010” is not a normal folder any more. It is a sub-volume in the btrfs file system.
- Btrfs can make snapshots of sub-volumes. On an incremental backup UrBackup creates a snapshot with the usual name like “121102-2210”. The snapshot has the contents of the last file backup. This snapshot operation is very fast, because no files have to be copied or file entries created. Btrfs does not internally create another copy of the file tree. It only creates copies of parts that we change in the future.
- UrBackup deletes all the files in the snapshot that have changed or have been deleted since the last backup.
- Then UrBackup loads the files that have changed or are new. If a file was changed UrBackup only writes changed parts of the file causing btrfs to only save the blocks in the file that are actually different.
- Once the backup has to be removed UrBackup simply deletes the whole sub-volume.
This has several advantages:
- Snapshot creation is very fast, causing faster incremental backups.
- Deleting the file backups is way faster because not every file has to be deleted.
- Only data that has changed in files between incremental backups is stored in the file system. This drastically reduces the storage requirements, e.g, for large database files.
The cross-device reflinks enable UrBackup to store same files which occur on different sub-volumes only once. On other filesystems this is done using hard links, but those only work on the same file system. Because the Linux kernel sees the sub-volume as a different file system the cross-device reflinks have to be used. There is no disadvantage in using them instead of hard links.
I’m releasing UrBackup Server 0.26.1 and Client 0.40.1 soon. They do have only minor bug fixes and additionally a Russian translation.
The next major version, which will probably be 1.0, will have following new features:
First of all you will be able to start and stop backups from the server web interface.
Then I reorganized the settings, both on the server web interface and on the client. You can also see the new bandwidth throttling feature which can limit the bandwidth usage of the backup server, both globally and for each client.
I added a few features to the new internet mode, described in the last post. Per default UrBackup does not do full file backups or image backups with an internet connection, but it can be enabled. Total global backup speed and backup speeds for each client can be set separately from the local backup speed. You can e.g. use this on the client to prevent UrBackup from using all your bandwidth. Additionally to being able to encrypt the transfer over internet UrBackup can now also compress it.
There is a new feature which lets you archive certain backups in certain intervals. Archived backups are not deleted during cleanups, until they are not archived anymore. Additionally to the automated archival you can also manually archive and un-archive certain file backups simply by clicking on them. For now only file backups can be archived.
This should be the major improvements. The are some minor ones as well.
Everything except the internet mode is ready for testing, so if anyone wants to help send me a mail at firstname.lastname@example.org or drop by in the forums and I will upload the appropriate builds.
Currently I’m working on a new internet mode for UrBackup. This means that you will be able to backup clients to a server on the internet with the upcoming new version.
This communication is of course encrypted and authenticated. It uses a shared key encryption with AES256 in CFB mode. It should be easy to configure: You just need to supply the server with its internet name/IP and the ports the clients should connect to. These settings, as well as random keys, are then pushed to the clients via the local (trusted) network. They can be manually entered on the client side as well. Then the key is pushed from the client to the server.
If you are not in the local network the client tries to connect to the internet server, if you entered something (e.g. a dns name or IP address) there. Then both check if they have the same key and if they do have the same shared key a normal connection, like if the client were in the local network, is established and backups can be performed.
I’ll now implement special options for disabling image and full file backups for clients connected via internet. Then I will implement a special, block based file (rsync like) transfer mode which will be used for those clients and which transfers less data in some scenarios.
Then you can look forward to backup archival and more detailed backup retention capabilities, which I’ll be working on next.
A long time ago I fixed a bug, where UrBackup Server on Windows could not backup files with a path name longer than 255 characters. It’s here:
Yesterday I reinstalled a test server and today I wanted to delete the old UrBackup backup folder. It threw error messages like you see in the screenshot.
Apparently the Windows Explorer (even in Windows Server 2008R2) cannot delete files with path names longer than 255 characters. And contrary to the error message you can also not move or rename them. You have to install some alternative file manager, to get rid of these files or shorten directory names such that the path length is smaller than 256 characters. Or let UrBackup delete them. In my case: I just left the folder there. I do not care. It’s a test server anyway.
Well done, Explorer.
My previous announcement that there will be a Berkeley DB backend was too hasty. I ran into some (for now) unsolvable problems. I posted them into the official Oracle forums but seem to get no reply there:
The final nail in the coffin was, that the advertised increased concurrency was not present. In my tests it performed even worse then SQLite in WAL journal mode. That and the perceived instability (I had a database corruption once) shine a pretty bad light on Berkeley DB. Maybe the SQL layer for Berkeley DB which I used is not stable yet?
On the plus side the tables I had to denormalize cause a significant speed increase for SQLite as well, so all this work was not for nothing.
Maybe I will revisit Berkeley DB in a few months/years.
I finally bit the bullet and worked on MSI installers for Windows. As anticipated it was not easy. I used WiX.
They do have some advantages over an installer distributed as “exe”:
- One can add the Microsoft Visual Studio runtime as a “merge module” thus avoiding starting it in the installer manually
- Apparently installing centralized on domain computers is easier
On the negative side:
- No shared 32/64 bit MSIs are possible. That means the user has to select the right one before downloading
- You cannot add custom commands as easily as in NSIS
I think I will only publish 64bit MSIs for now. Most Windows Servers should be 64bit now anyway and I will still publish the “old” installers for users of older and 32bit systems.
In the upcoming version of UrBackup Server you will be able to choose BerkeleyDB as database backend instead of SQLite. I am still deliberating about if I will make it the default in Windows. Not in Linux though as the BerkeleyDB version UrBackup needs is not yet e.g. in Debian stable.
The advantage of BerkeleyDB over SQLite is that it is build for higher concurrency. So if you want to have a lot of simultaneous backups you should definitely use it. The new SQLite compatibility layer of BerkeleyDB also made it very easy to add that alternative backend. (The BerkeleyDB people do not like you calling it a backend. They say it is a SQLite frontend for BerkeleyDB.)
It is not as robust as SQLite though. For example it has some problems when the filesystem the database is saved on is at its capacity limit. In my case it slowed down to a crawl. Also if you do not adjust the database parameters correctly it may throw “out of memory” errors. I am still testing what the correct parameters are. If you just set them really high it needs a lot of memory.
For example it said “Lock table out of lock entries” but increasing the number of lockers such that this error did not occur any more resulted in 1GB more memory usage. This is simply too much. I then tracked the problem down to a table join in which BerkeleyDB seems to need a disproportionately large number of locks. Denormalizing a table, such that this join is no longer necessary solved that problem, I think.
BerkeleyDB may also be slower in situations where no concurrency is involved as it has a much more fine grained locking system and in such situations then locks to much. So using it with only a few clients will cause unnecessary locking overhead.
I plan on automatically converting the database as soon as the BerkeleyDB plugin is loaded into UrBackup Server. So the only inconvenience should be a long database reconstruction time during which the database is converted.
The table denormalization and some index rebuilding will also take place on upgrade. This also took a lot of time on my test system.
Once you switched, converting back will be a manual job. I think I’ll have to write in detail about that once the new version is released.