Progress update on the next major UrBackup version

Currently the next major UrBackup version is kind of close to getting finished. There are a few major work in progress areas. Once they are finished and I have done some overall testing I will release a beta version.

The major changes in the new version are:

  • Completely reworked the file deduplication and file backup statistics calculation. This should be much faster, scalable and reliable now.
  • The Copy-on-Write image backups on btrfs mentioned in the last post. Synthetic full backups for the VHD/VHDZ file format and settings for basing image backups on the last full or last incremental backup (differential/incremental).
  • File backups include file metadata including file modification time, ACLs, alternate data streams etc.
  • Backup of streaming data. E.g. the output of “mysqldump”/”pg_dump”. I plan to add basic backup scripts for popular Open Source databases to the client
  • New file restore feature which restores file backups and properly restores the file meta-data
  • The ACLs/file permissions are used to enable users to directly access backups on the web interface from the explorer on the clients (via right click -> Access/Restore backups)
  • Proper backups of symbolic links. Symbolic links which point to folders/files which are backed up are backed up as symbolic links and symbolic links which point outside of the selected backup set are followed/not followed depending on a setting
  • The web interface has been bootstrapified (http://getbootstrap.com/ – mombojuice did the work) and looks much more modern now
  • Simultaneous image and file backups
  • Backup and restore EFI boot sector and partition on UEFI systems. Restore CD that boots with UEFI firmware
  • Client for Mac OS X
  • Forward secrecy for Internet clients via ECDH and Internet client security improvement by using AES-GCM
  • Switch from DSA to ECDSA for client update and server identity signatures

Still to do:

  • Lot’s of testing and bug fixing
  • Backup and restore of file meta-data on Mac OS X and Linux
  • Symbolic link backup handling on Mac OS X and Linux
  • Automatic client update for Mac OS X like for Windows
  • Restoring files which are in use on Windows (via restarting)
  • Update Documentation

Remaining UrBackup limitations (to be done after with a subsequent version):

  • Recognize hard links and backup the files only once
  • Backup only used areas of sparse files
  • Continuous file backup

Once the “to do”s are done this is a big step forward for UrBackup. Having streaming file backups and incremental, differential, synthetic full and full image backups basically allows you to implement pretty much every backup strategy with UrBackup. The only thing missing is the continuous file backup and I already started work on that.

For example you could use UrBackup instead of TimeMachine on Mac OS X and do a full system restore via the file restore feature (this is not implemented at all – it is just an example what it might be able to do). You probably don’t even need an image of your Windows system partition but can restore it via the file backup restore (albeit inefficiently, because the hard links in C:\windows\winsxs are not handled properly).

Using btrfs on Linux to store raw image files in an efficient manner

Currently UrBackup is storing images of volumes as VHD (Virtual HardDisk) files. In UrBackup Server 1.4 the VHD file format was augmented with a custom compression. This combination gives following advantages over storing images as „raw“ files:

  1. Only used parts of the volumes are stored to the VHD files (sparse image)
  2. If the compression is enabled, the volume data is compressed
  3. Incremental image backups: A VHD file can reference another VHD file, so unchanged data does not have to be stored twice (differential image)

All this can be implemented with btrfs. Volume images can then be stored as raw files:

  1. There is a Linux system call to „punch holes“ into a file. Btrfs supports this. Using hole punching only the parts of the raw image file which are used are stored
  2. Btrfs has a compression feature
  3. You can copy a file on btrfs without copying the data the files contain. You can do this in a Linux console via „cp –reflink=always“. A file copied this way only adds another reference to the file data. The file only becomes a “real” copy at changed parts of the file (Copy on Write).

An incremental image backup can then be done like this: Create reflink of the last image backup, change the parts in the image file that have changed since the last image backup and then punch holes into the file to remove unused parts.

As you can see, this moves the implementation of compression, sparse image support and differential image support into btrfs, which is a good thing, because btrfs is starting to get widespread adoption.

The implementation using btrfs has several advantages:

  • The volumes are stored as raw files, which makes inter-operation easier. For example you can directly map the file to a Linux volume device, which makes the file look like a volume on a hard drive.
  • UrBackup can delete any image in the incremental backup chain and btrfs takes care of cleaning up the unused blocks. This makes running infinite incremental image backups viable. After the initial full backup you only need to make incremental image backups

The only disadvantage I saw till now is that UrBraw-copy-on-write-fileackup cannot accurately display the size of incremental image backups anymore. It always shows the size as if it was an uncompressed full image.

This new image file format will be available in UrBackup 1.5 if UrBackup is configured such that it uses btrfs.

Performance considerations for larger UrBackup server instances

If you are planning on setting up a larger UrBackup server instance you will find some hints about performance in this post. “Large” in this context is difficult to define, because it depends on the number of clients, the number of files and file sizes in the backups and the backup intervals.

If you plan on setting up a “larger” UrBackup instance you should keep the following things in mind:

  • UrBackup uses an internal (SQLite) database. This database could cause performance problems, especially with a large number of backed-up files and full file backups. The database should be stored on storage suited for databases.
  • UrBackup has some tuning options, but is nevertheless pretty optimized per default. You should only have to tune UrBackup in special circumstances. You will find information about the tuning options in the administration manual.
  • There are many platform options and each has its own considerations, so you should read up on the platform-specific performance considerations. For example you should not run FreeNAS virtualized.

Your system will almost certainly be IO limited. If you have a running system you can verify that by looking at the performance monitor on Windows, iostat on Linux and zfs iostat on FreeBSD. Often it is limited by random read/write performance (Input/Output operations per second).

If you want maximum IO performance following should therefore be the case:

  • The UrBackup database should be on an SSD. This should be a no-brainer, as this database does not get too large and SSDs are way faster than spinning disks. The random reads/writes are for example 900 times faster with a Samsung 840 Pro (97K IOPS).
  • The UrBackup database should not be on the same disk as the backup storage.
  • The UrBackup database should not be on a RAID-5, as this is not optimal for databases.
  • If the database is still the bottleneck (because it is a separate device you can find this out using iostat or an equivalent), you can use the “file entry cache” (see manual). This cache should be on a separate SSD, otherwise it will only cause more IO on the one device.
  • Save the filesystem metadata of your backup storage on an SSD and only the actual data on a spinning disk RAID-5/6 to get the maximum performance. This is only possible with btrfs on Linux.
  • Avoid full file backups. When doing a full file backup UrBackup has to load all files, calculate their hash value and look this value up in the database. This incurs a lot of IO on both the database and the backup storage (and the client). UrBackup can run an infinite amount of incremental file backups without any full file backups.
  • Optimize the maximum number of concurrent backups such that the throughput is maximized.

Ways to absolutely kill performance:

  • Save the UrBackup database on ZFS/btrfs on a spinning disk. Databases (those who use WAL or intent logging; including SQLite3) on ZFS/btrfs are a known pathological case for copy-on-write file systems. The database files get horribly fragmented. Btrfs has a (currently disabled per default, because it is not stable) background defragmentation for that, but ZFS does not.
  • Save the UrBackup database on a RAID-5. See http://www.baarf.com/.
  • A lot of full file backups.
  • Enable ZFS deduplication without having enough RAM for the dedup table. See here for a discussion.

New Features in UrBackup Server 1.3

It has been a while and there are now a lot new features in UrBackup Server 1.3.

Users of the web interface can download a Client specific installer directly from the server now. The installer has the UrBackup Server information embedded, such that the client automatically connects to the server once it is installed. I’ve also published a script that connects to a UrBackup server, creates a client named like the local computer and then downloads and executes the client installer. This enables a one click setup experience for Internet clients.

The new live log lets you see what the UrBackup server is currently busy with. You can either see all debug level log messages or client specific log messages.

 

 

You can see which files the UrBackup Server is currently working on and the usual log messages which you can also view afterwards via web interface or on the client.

 

A “hidden” feature is now accessible via web interface: You can disable any type of backup for any client.


More fine grained permissions
for the client allow you to prevent the users from starting full file backups, but still allow incremental file backups.

 

The soft client quota allows you to limit the amount of storage each client can use. During the nightly cleanup UrBackup deletes the client’s backups until the storage usage is within the bounds prescribed by the soft client quota. Other than a percentage value you can also use something like “20G” as soft client quota.


You can now have separate backup windows for incremental/full file/image backups.

 


Client-side file hashes
prevent the re-transfer of files that are already on the server, e.g., because another client has the same file. In some situations this drastically reduces the bandwidth requirements and speeds up file backups over Internet.


If you have performance problems with file backups the new file entry cache may help you. If the file entry cache is enabled file entries (a mapping from file hash to file paths) are cached in a separate database which may speed up backups. The caches are automatically created and destroyed if this setting is changed (and the server restarted), but creation may take a long time. LMDB makes heavy use of memory mapped files and is therefore only advisable on a 64bit operating system. It does also create a very large sparse file on Windows. When in doubt use the SQLite cache.

 

 

Backup server for the raspberry pi

Binaries of UrBackup for Raspian are available now here (Update: See the nighlty builds or the “wheezy” folder of Debian of the official releases). Performance isn’t that good on the raspberry pi, obviously. I’d avoid using any kind of compression, for example.

Setup instructions:

First setup the backup storage. I suggest using btrfs. Plug in a USB-hard disk and create a partition, e.g. using cfdisk. Then install btrfs and create the filesystem:

apt-get update
apt-get install btrfs-tools
mkfs.btrfs /dev/sda1
mkdir /media/backup

Add an entry to /etc/fstab to automatically mount the USB-hard disk:

/dev/sda1 /media/backup btrfs defaults,nossd 0 0

Then mount the hard disk:

mount /media/backup

Download the UrBackup server package and install it:

wget http://sourceforge.net/projects/urbackup/files/Server/1.2RC/urbackup-server_1.2-1_armhf.deb/download
dpkg -i urbackup-server_1.2-1_armhf.deb

Resolve the dependencies:

apt-get -f install

Enter e.g. /media/backup/urbackup when it asks for the backup storage path.

UrBackup server should run now and automatically backup clients as soon as you install the UrBackup client on one of your PCs in the local network (from here). If you want to change any settings you should go to the web interface. Let 192.168.0.42 be the IP of your raspberry pi. You can then access the web interface of UrBackup via typing http://192.168.0.42:55414 into your address bar. If you want to access your files you should share /media/backup/urbackup e.g. via samba. There are guides on how to install samba on a raspberry pi (here for example).

Sidenote: I’ve heard that it runs even better on a Cubieboard. It has a SATA-port so the USB-Bottlenet is gone. Unfortunately, they’re even harder to get than Raspberry Pis.

File Backup Storage with Btrfs Snapshots

The Linux kernel 3.6 finally added a feature to btrfs which I needed to easily implement a snapshot file backup storage. This feature will be automatically enabled if the backup storage path points to a btrfs file system and btrfs supports cross-sub-volume reflinks.

A backup with snapshots works like this:

  • Every file backup is put into a separate sub-volume, i.e., a file backup with a name like “121102-1010” is not a normal folder any more. It is a sub-volume in the btrfs file system.
  • Btrfs can make snapshots of sub-volumes. On an incremental backup UrBackup creates a snapshot with the usual name like “121102-2210”. The snapshot has the contents of the last file backup. This snapshot operation is very fast, because no files have to be copied or file entries created. Btrfs does not internally create another copy of the file tree. It only creates copies of parts that we change in the future.
  • UrBackup deletes all the files in the snapshot that have changed or have been deleted since the last backup.
  • Then UrBackup loads the files that have changed or are new. If a file was changed UrBackup only writes changed parts of the file causing btrfs to only save the blocks in the file that are actually different.
  • Once the backup has to be removed UrBackup simply deletes the whole sub-volume.

This has several advantages:

  • Snapshot creation is very fast, causing faster incremental backups.
  • Deleting the file backups is way faster because not every file has to be deleted.
  • Only data that has changed in files between incremental backups is stored in the file system. This drastically reduces the storage requirements, e.g, for large database files.

The cross-device reflinks enable UrBackup to store same files which occur on different sub-volumes only once. On other filesystems this is done using hard links, but those only work on the same file system. Because the Linux kernel sees the sub-volume as a different file system the cross-device reflinks have to be used. There is no disadvantage in using them instead of hard links.

Btrfs: Experimental or approaching stable?

Thinking that btrfs should approach a stable status by now, I used it as file system for the backups on my primary test server. Well, it works well and is really fast. Compression seems to work, too.
But there was a unexpected warning during one backup. It could not create a hard link. In this case UrBackup logs a warning and copies the file.
I checked the inode number of the file that failed to link and it was hard linked nonetheless – weird: Btrfs claimed it could not hard link even though it already did it. There were also only about 500 links to that inode. I checked the file system limit for the number of hard links and discovered that btrfs supports as few as 31 hard links to the same inode in one directory.
There is a relatively new patch to raise this limitation http://lwn.net/Articles/498226/, but it has not landed in the kernel yet. This issue does not affect UrBackup that much, but nonetheless, it shows us something about the stability of btrfs.
Thus my advise: Don’t use it yet.

Internet Mode

Currently I’m working on a new internet mode for UrBackup. This means that you will be able to backup clients to a server on the internet with the upcoming new version.
This communication is of course encrypted and authenticated. It uses a shared key encryption with AES256 in CFB mode. It should be easy to configure: You just need to supply the server with its internet name/IP and the ports the clients should connect to. These settings, as well as random keys, are then pushed to the clients via the local (trusted) network. They can be manually entered on the client side as well. Then the key is pushed from the client to the server.

If you are not in the local network the client tries to connect to the internet server, if you entered something (e.g. a dns name or IP address) there. Then both check if they have the same key and if they do have the same shared key a normal connection, like if the client were in the local network, is established and backups can be performed.

I’ll now implement special options for disabling image and full file backups for clients connected via internet. Then I will implement a special, block based file (rsync like) transfer mode which will be used for those clients and which transfers less data in some scenarios.

Then you can look forward to backup archival and more detailed backup retention capabilities, which I’ll be working on next.

Desktop GNU/Linux: You Have to Love It

You know those parents that love their child so much, they do not see how bad their child in reality really is. I think you have to have a similar relationship to GNU/Linux if you really want to use it as your desktop operating system. That does not mean, that it is generally bad. Just like the child it has its strong points. E.g. the kernel. I am an avid fan of Linus Torvals autocratic management of kernel development. And have no doubt, it is autocratic. He decides in which direction the kernel moves and the success the kernel has had, is in my opinion, largely caused by his pragmatic style.
One could say that the success of the kernel part of GNU/Linux was caused by his strong leadership. And in areas where the operating system does less well, there is a lack of leadership.
For example the window managers. Mainly there is KDE and Gnome. They have different UI frameworks and it is already kind of sacrilegious to use a KDE application in Gnome, because it uses more memory. But additionally to that, this application won’t have the same style.
Of course they have different systems for start menu entries, tray icons, settings and pretty much everything you can think of. Thankfully there is kind of a standardization body named FreeDesktop.org. The problem – as with every standardization process – is that it moves slowly and the resulting standard does not define all useful scenarios. Thus the new features are sometimes still not accessible in a common way.
We speak here of a fragmentation within the operating system: In order to make a GNU/Linux application which uses UI and has a native look&fell you need to do everything twice now. Once with GTK and once with Qt (used by KDE).
But this does not end there: You have to think of the zillion other window managers out there. XFCE, Unity, Fluxbox you name them. Thankfully most are based on either GTK or Qt. Nevertheless: In each one of those, your application may not display its tray icon correctly.
And as you perhaps know: The UrBackup Client displays a tray icon.

Anticipating all these complications I am using a cross platform toolkit for the UrBackup Client: wxWidgets. Theoretically it is available for both Qt and GTK. As every level of abstraction this gives you slightly less power, but the application is simple right?
Well, try to show such a balloon popup on windows and we can talk. But otherwise it really worked mostly well.
So I compiled the client in Debian and checked if everything was working. And it did. Then – to test it in a more popular desktop distribution – I downloaded Ubuntu.
The tray icon did not show up. Turns out Unity has a whitelist of apps that can show tray icons. This annoys many users as e.g. Skype won’t show up any more. You can of course allow tray icons by editing some arcane setting somewhere. But this is not something the end-user should do right?
But it gets worse: After I edited that setting to allow all applications, it still did not work.
Turns out they did not like the FreeDesktop.org standard any more and made their own. In order for it to work I would have to use a separate library (libappindicator) to display the tray icon. Libappindicator only works with Unity on Ubuntu, so I would have to make and release a different version of my application for Ubuntu. Not acceptable.
I’ll repeat: I’m using libwxgtk2.8 which is officially part of Ubuntu to display a tray icon. This does not work because wxWidgets uses the FreeDesktop.org protocol to display the tray icon which Ubuntu decided to abandon. The wxWidget guys (understandably) seem to not want to fix that issue in wxWidgets 2.9, probability because they do not want to implement something only for Ubuntu, as well.
Simultaneously the Ubuntu fork Mint which does not use Unity is becoming more popular. So perhaps this specific problem will resolve itself this way. This issue certainly seems to have caused some waves: http://blogs.gnome.org/bolsh/2011/03/11/lessons-learned/
Bottom line of that article is that the FreeDesktop.org standardization process is broken. And this is just one example of the kind of fragmentation we developers have to think about.
Compare that to Windows where the program that displayed tray icons in Windows 95 probably still can display it in Windows 7. After 11 years! Too easy.

I said at the beginning that a strong leadership is needed just like for the Linux kernel. This strong leadership would have to coordinate efforts in different window managers and in different distributions. This is difficult because the distribution is the thing the users sees and holds responsible for something that is not working. This is also the reason why you install packets from your distribution. Because doing it any other way probably causes something to not work or even break. Because only distributors are accountable it is very difficult to establish something like FreeDesktop.org – an inter distribution standardization body. There is simply no incentive to play along nicely, especially since standardization processes tend to be lengthy and difficult and you want your distribution to be progressive and modern.
The only way I see this dilemma could be solved, is by having one distribution which the majority of GNU/Linux (desktop) users use. I hoped this could be Ubuntu lead by Mark Shuttleworth. But Ubuntu is sadly moving into the wrong direction at the moment.
Additionally to that. It is moving too fast. Given some time and persuasion Gnome probability would have adapted the libappindicator interface and I would not have this problem now.
Unity is – in my opinion – really crappy. I did not even find out how to switch applications without alt+tabbing and had to use the windows key to start one (start menu where are you?). If this is someone’s idea of usability and end user friendliness then I give up all remaining hope for Ubuntu.

Given all that, I have decided to not start building any packages for Ubuntu/Debian. The support matrix would just be too great and I already named one issue. If you really love Linux that much that you use it as a desktop operating system, I leave you to grab the source code and build it yourself – no guarantees that it works on your specific distribution in your window manager. Thankfully the back-end part – the part that does the backups – is not dependant on any flaky UI/Window manager stuff and so should be there to stay. If the frontend does not work for you (aka it does not display the tray icon) you can always set the directories it backs up on the server.
I hope that some time in the future someone from a distribution picks up that code and builds working packages for that distribution. But that someone won’t be me. This far, and no further! Sorry, Linux. I will still love you. But only as my server child. Not a desktop one.