Btrfs file system stability status update

I wrote before in a blog entry about btrfs stability. That was almost exactly seven years ago.The issue I had back then (about a hard link limit of hard links in the same directory) was fixed a long time ago.

Since then btrfs has come a long way. The most significant development was that SUSE started using btrfs as root file system and put several developers to work on finding and fixing bugs. Simultaneously only few new features were added. This significantly improved stability (at least for the root file system use case).

As just hinted, the main problem is that each developer just looks after their set of use cases. So, developers from Facebook mainly look at their use case of using btrfs on their HDFS storage nodes. They have redundancy etc. on a higher level, so they are probably using btrfs only with single disks and focusing on making btrfs performant and stable within this use case. SuSE, as said, uses it as root file system (system files), where there are single disks or, at most, mirrored disks. File system repair isn’t too important, as those file systems don’t get very large and contain non-unique data.

Compared to that when using UrBackup with btrfs:

  • Storage is for backup and/or long term archival (RAID6 etc. would be appropriate)
  • Can have a lot of threads writing data to the file system at once
  • Can have combination of threads writing lots of small files and large files (image backups)
  • Lots of snapshots (backups) of different sub-volumes, some of which can persist for a long time
  • Deduplication (at least file level)
  • Can’t reboot the server, when it gets stuck like you’d be able to with e.g. a HDFS node
  • UrBackup imposes a database style workload (db for deduplication), where latency is important as well as the backup workload, where throughput is important
  • File system can get very large and it is very time consuming to move data from a damaged (read-only mounted) file system to a new one

This, combined, was guaranteed to cause problems when seriously using btrfs as UrBackup storage. The most persistent issue was premature ENOSPC, where btrfs has an error saying it is out of space, remounting the file system read-only, even though there is still a lot of free space (both for data and metadata) available on the disk(s). The problem seems to be solved (on the systems I observe) with Linux 5.1 or Linux 4.19.x with this patch. RAID56 is still officially “unstable”.

Btrfs isn’t alone with having issues with this work load. When using UrBackup seriously (large number of clients with many simultaneous backups) e.g. with ZFS, I experienced a lot of different issues with ZFSOnLinux (especially in connection with memory/thread management), making ZFSOnLinux unusable. ZFS on FreeBSD was a little better, but there also were issues that occurred about once per week causing hangs.

Btrfs also isn’t the sole area where Linux was needing improvement in the storage area. For example, writeback throttling was significantly improved a few years ago. This example improves the memory management, making Linux more able to handle the mixed database and archival workload mentioned above. Not to say that all errors are fixed. For example, I recently learned that a call to synchronize the file system to make sure that data is actually written to disk does not return errors on Linux. There is now a work-around for that, at least for btrfs, in UrBackup but there hasn’t been a fix on the Linux side in general, yet.

Another important consideration is performance. One thing w.r.t. btrfs to keep in mind is that there is a feature, or better trade-off, where there is a back-reference from each disk data location to each file/item using that disk data location. So, you can ask btrfs to list all files having data at e.g. disk offset 726935388160 and it will do so fast compared to other file systems like EXT, ZFS, NTFS, etc. which would have to scan through all files. Managing the backref metadata comes with the cost of having more metadata in general, though. Operations such as writing new data to files, deleting files, deleting snapshots, etc. become a bit slower as the backref metadata has to be added or deleted, in addition to the (forward) metadata pointing from file to data. Btrfs goes to great lengths making this fast, though (delayed backrefs etc.). There is a number of unique features of btrfs which could be implemented because of the presence of back-references:

  • File system shrinking
  • Device removal
  • Balancing
  • If there is a corruption, it can show all the files (and offsets into files) affected
  • Quotas
  • Btrfs send

The performance problem with backrefs is that if a file has a lot of reflinks or snapshots, it may have a lot of backrefs. Running one of the operations above then involves a lot of iteration through all backrefs, making this operation (unusably) slow. UrBackup naturally creates a lot of snapshots and may reflink certain files many times. The work-around (at least for me) is to avoid the above operations at all cost and patch out the reflink walking in btrfs send.

Conclusion

Should you use btrfs with UrBackup? It is certainly the file system with which UrBackup works the best. Outside of that you’d have to see if you can have:

  • Linux kernel at least 4.19.x (I suggest using the latest LTS kernel). If you have ENOSPC issues with 4.19.x apply patch.
  • Able to avoid above operations such as file system shrinking or device removal
  • Able to avoid btrfs RAID56 (i.e. use RAID1(0) instead or use btrfs in single mode on top of a storage layer that provides integrity such as Ceph or the UrBackup Appliance)
  • I’d suggest using ECC memory, as btrfs repair once it is damaged is mostly not possible

Connect clients with a HTTPS CONNECT web proxy

With 2.4.x you can use UrBackup with a HTTPS proxy. This way you can have the web interface and the clients connecting at the same port, secured by the same transport encryption (SSL). This post shows how to do this in combination with the Apache web server.

The idea is that the client connects to the web server and issues a HTTP CONNECT request to the actual UrBackup server.

First Enable CONNECT proxy module in apache. On debian via

a2enmod proxy_connect  

Then allow connections to the UrBackup server Internet port by adding

AllowConnect 55415

to your apache configuration.

Next in your apache virtual host configuration, set proxy options such as the timeout, allow proxy connections to the UrBackup server, and disallow them to every other host:

ProxyTimeout 600
ProxyRequests On  
<Proxy 127.0.0.1:55415>
</Proxy>
<ProxyMatch ^(?!127.0.0.1:55415$).*$>
    Order Deny,Allow
    Deny from all
</ProxyMatch>

Then, go to your UrBackup server web interface and setup your web server URL as Internet client proxy (https://example.com) and the Internet server name/IP as 127.0.0.1. Internet clients should then start connecting via your web server to your UrBackup server. Once all clients connect this way you could turn off UrBackup’s build in Internet transfer encryption and rely on SSL.

Fixing client IP addresses

You may notice that on the status page all Internet clients now show the IP address of your web server as their IP address. Fixing this is a bit difficult, as there is no standard way to forward the client IP address information from the web server (compared to a normal HTTP proxy where there is a X-Forwarded-For header). So, a bit of hacking to fix this is in order. I modified the mod_proxy_connect apache plugin to forward the client IP information in a 50 byte buffer to the backend: mod_proxy_connect.c
On debian you could replace your original mod_proxy_connect with the modified one via the following commands:

apt install apache2-dev
wget https://gist.githubusercontent.com/uroni/143c0d7ed6169e89f2d6c59a870dd4cc/raw/28dd30b1f82938777c504f2afdc5f162fd91b3fd/mod_proxy_connect.c
apxs -i -a -c mod_proxy_connect.c

Then in the UrBackup server advanced global settings set “List of server IPs (proxys) from which to expect endpoint information (forwarded for) when connecting to Internet service (needs server restart)” to include your web server IP (127.0.0.1 in the example here). After a server restart you should be able to see the actual client IP instead of the web server IP on the status screen.

Fixing SNI errors

If you have multiple virtual hosts with SSL there is an issue with SNI. Apache2 automatically compares the hostname in the CONNECT request with the server name in the SSL connection (SNI) and rejects the request if they differ. The only solution (or ugly hack) I found to fix this was to add the hostname with the target IP to /etc/hosts and then use the hostname instead of the IP in the CONNECT request. I.e., add “127.0.0.1 example.com” to /etc/hosts, then replace 127.0.0.1 with example.com in all the configuration above.

Additional proxy authentication

As additional security layer, one can require proxy authentication. Clients need to know a username+password to get through the web server to the UrBackup server. With apache2 e.g.:

 htpasswd -c -b /etc/apache2/urbackup_password urbackup passw0rd 

Then modify the proxy section to:

<Proxy 127.0.0.1:55415>
    AuthType Basic
    AuthName "Restricted UrBackup"
    AuthBasicProvider file
    AuthUserFile "/etc/apache2/urbackup_password"
    Require user urbackup
</Proxy>

Afterward add username+password to the proxy url, that is e.g. https://urbackup:password@example.com

Image mounting in UrBackup Server 2.1.x

UrBackup 2.1.x can now mount image backups. That is, it lists the image backups it has on the web interface, you can browse into them and e.g. download files or directories as a ZIP file.
I am particularly proud that this works on Linux as well as on Windows with both raw image files and VHD(z) files. On FreeBSD it only works with raw image files currently. The screenshots are from a Windows UrBackup server.

On Linux it uses libguestfs-tools to mount images in a sandboxed virtual machine. On Windows/FreeBSD mounting a hostile image may be a dangerous operation.

How I backup LVM volumes on my Xen server

I’ve got a Xen server which runs a couple of Linux and Windows VMs. The VMs are stored in LVM volumes on a LVM volume group which is on a bcache device. The bcache device consists of a mirrored SSD pair (using mdraid) as cache and a mirrored HDD pair (also using mdraid) as backing storage. The SSD caching gives a nice performance boost, but nowadays I would go with SSD storage only, because bcache caused some problems (did not play nice with udev during boot).
The Windows VMs are backed up by installing the UrBackup client in the VMs. To restore I’d need to boot the restore CD in Xen or restore the Windows images via command line in the hypervisor.
The Linux VMs are backed up at hypervisor level in the Xen dom0 (which is Debian in this case) using LVM snapshots. To create and remove LVM snapshots I have following snapshot creation and removal script (the volume group on which the volumes are is mirror-vg).

Snapshot creation script at /usr/local/etc/urbackup/create_filesystem_snapshot:

#!/bin/bash
set -e
SNAP_UID=$1
VOLNAME="$5"
VGNAME="mirror-vg"
if [[ $VOLNAME == "" ]]; then
        echo "No volume name specified"
        exit 1
fi
if [[ $VOLNAME == "other-data" ]]; then
        VGNAME="data2-vg"
fi
if [[ $SNAP_UID == "" ]]; then
        echo "No snapshot uid specified"
        exit 1
fi
export LVM_SUPPRESS_FD_WARNINGS=1
lvcreate -l100%FREE -s -n $SNAP_UID /dev/$VGNAME/$VOLNAME
SUCCESS=0
trap 'test $SUCCESS = 1 || lvremove -f /dev/$VGNAME/$SNAP_UID' EXIT
mkdir -p /mnt/urbackup_snaps/${SNAP_UID}
mount -o ro /dev/$VGNAME/$SNAP_UID /mnt/urbackup_snaps/${SNAP_UID}
SUCCESS=1
echo "SNAPSHOT=/mnt/urbackup_snaps/$SNAP_UID"
exit 0

Snapshot removal script at /usr/local/etc/urbackup/remove_filesystem_snapshot:

#!/bin/bash
set -e
SNAP_UID=$1
SNAP_MOUNTPOINT="$2"
if [[ $SNAP_UID == "" ]]; then
        echo "No snapshot uid specified"
        exit 1
fi
if [[ "$SNAP_MOUNTPOINT" == "" ]]; then
        echo "Snapshot mountpoint is empty"
        exit 1
fi
if ! test -e $SNAP_MOUNTPOINT; then
        echo "Snapshot at $SNAP_MOUNTPOINT was already removed"
        exit 0
fi
if ! df -T -P | egrep "${SNAP_MOUNTPOINT}\$" > /dev/null 2>&1; then
        echo "Snapshot is not mounted. Already removed"
        rmdir "${SNAP_MOUNTPOINT}"
        exit 0
fi
if lsblk -r --output "NAME,MOUNTPOINT" --paths > /dev/null 2>&1; then
        VOLNAME=`lsblk -r --output "NAME,MOUNTPOINT" --paths | egrep " ${SNAP_MOUNTPOINT}\$" | head -n 1 | tr -s " " | cut -d" " -f1`
else
        VOLNAME=`lsblk -r --output "NAME,MOUNTPOINT" | egrep " ${SNAP_MOUNTPOINT}\$" | head -n 1 | tr -s " " | cut -d" " -f1`
        VOLNAME="/dev/mapper/$VOLNAME"
fi
if [ "x$VOLNAME" = x ]; then
    echo "Could not find LVM volume for mountpoint ${SNAP_MOUNTPOINT}"
    exit 1
fi
if [ ! -e "$VOLNAME" ]; then
    echo "LVM volume for mountpoint ${SNAP_MOUNTPOINT} does not exist"
    exit 1
fi
echo "Unmounting $VOLNAME at /mnt/urbackup_snaps/${SNAP_UID}..."
if ! umount /mnt/urbackup_snaps/${SNAP_UID}; then
        sleep 10
        umount /mnt/urbackup_snaps/${SNAP_UID}
fi
rmdir "${SNAP_MOUNTPOINT}"
echo "Destroying LVM snapshot $VOLNAME..."
export LVM_SUPPRESS_FD_WARNINGS=1
lvremove -f "$VOLNAME"

The snapshot scripts are specified via the file /usr/local/etc/urbackup/snapshot.cfg:

create_filesystem_snapshot=/usr/local/etc/urbackup/create_filesystem_snapshot
remove_filesystem_snapshot=/usr/local/etc/urbackup/remove_filesystem_snapshot
volumes_mounted_locally=0

Then I have a virtual client for each LVM volume that needs to be backed up. I have put those virtual clients in a settings group with the default path to backup “/|root/require_snapshot”.

For restore I need to recreate the LVM volume. Create a file system on it (e.g. with mkfs.ext4) mount it in the hypervisor and then restore via.

urbackupclientctl restore-start --virtual-client VOLUMENAME -b last –map-from / --map-to /mnt/localmountpoint

New in UrBackup 2.0.x

Wactivitieseb interface modernization. The web interface was a little bit utilitarian which gave many people the wrong impression. With the help of mombojuice the web interface was improved such that it looks much more modern and professional. Many small improvements were made as well. For example the dates are now formatted according to browser locale, backups can be started via drop-down menu and the live log of a running backup can be directly accessed from the activities screen.

Improved file deduplication. Completely reworked the file deduplication and file backup statistics calculation. This should be much faster, scalable and reliable now.

File lastmodifiedbackup improvements. File meta-data such as last modified time and file permissions are now backed up on all supported client systems (Windows, Linux, Mac OS X). Supporting more exotic file system features such as sparse files UrBackup is now a fully featured file backup solution.

File backup restore. To restore the file meta-data UrBackup has now an integrated file restore. The file restore reuses client-side hashes, if present, and transfers only differences, such that restoring folders with only few changes since the restored backup is fast.

access_backupsDirect backup access. If configured, the backed up file permissions are used to allow clients direct access to their files with only minimal configuration. On Windows there is a shortcut in Explorer which directly opens the relevant/file folder in the browser. There is a new list view which shows a file/folder in all backups. For files, hashes are used to show when the file content changed (versions).

image_backup_settingsImage backup improvements. UrBackup supports GPT formatted disks now and the restore CD boots on UEFI firmware devices (also with secure boot enabled). In combination with btrfs, UrBackup supports an incremental forever style image backup and image backups over 2TB. For VHD/VHDZ UrBackup has now settings to base incremental backups on the last or last full image backup. Full image backups can be configured to be synthetic full backups transferring only changes since the last image backup.

Significant security improvements. Forward secrecy for Internet clients via ECDH and Internet client security improvement by using AES-GCM. Switch from DSA to ECDSA for client update and server identity signatures. Web server/restore CD login now uses PBKDF2.

Mac OS X client. There is nowmac_backup_running a UrBackup Client for Mac OS X. This client is fully featured, excluding image backup (like Linux client). The Mac OS X client can be used as a technically superior backup solution to Time Machine.

Improved command line. Mainlttyy for Linux all command line usage has been significantly improved. This includes the command line client (urbackupclientctl), the server command line (urbackupsrv) and the restore client.

Linux file system snapshotting. Snapshotting now also works on Linux and is fully integrated. A portable Linux client includes snapshot scripts for LVM, dattobd and btrfs which work without changes in most cases.

Lots of other changes. Proper symbolic link backup. Virtual clients allow you to backup different sets of files at different intervals and max/min amounts. Simultaneous image and file backups. Different backup speeds and backup intervals at different times. Improved Internet transfer compression. New hashing method where the server only needs to hash changed parts of a file.

 

Start of UrBackup 2.0 beta phase

UrBackup Server/Client/Restore 2.0.0 beta was recently released. This marks the beginning of the UrBackup 2.0 beta phase.

See the forums for download links and discussion.

UrBackup 2.0 marks the beginning of UrBackup having no major limitations. If you still find some please start a discussion in the forums. The next beta version will also properly support sparse file backup.

Progress update on the next major UrBackup version

Currently the next major UrBackup version is kind of close to getting finished. There are a few major work in progress areas. Once they are finished and I have done some overall testing I will release a beta version.

The major changes in the new version are:

  • Completely reworked the file deduplication and file backup statistics calculation. This should be much faster, scalable and reliable now.
  • The Copy-on-Write image backups on btrfs mentioned in the last post. Synthetic full backups for the VHD/VHDZ file format and settings for basing image backups on the last full or last incremental backup (differential/incremental).
  • File backups include file metadata including file modification time, ACLs, alternate data streams etc.
  • Backup of streaming data. E.g. the output of “mysqldump”/”pg_dump”. I plan to add basic backup scripts for popular Open Source databases to the client
  • New file restore feature which restores file backups and properly restores the file meta-data
  • The ACLs/file permissions are used to enable users to directly access backups on the web interface from the explorer on the clients (via right click -> Access/Restore backups)
  • Proper backups of symbolic links. Symbolic links which point to folders/files which are backed up are backed up as symbolic links and symbolic links which point outside of the selected backup set are followed/not followed depending on a setting
  • The web interface has been bootstrapified (http://getbootstrap.com/ – mombojuice did the work) and looks much more modern now
  • Simultaneous image and file backups
  • Backup and restore EFI boot sector and partition on UEFI systems. Restore CD that boots with UEFI firmware
  • Client for Mac OS X
  • Forward secrecy for Internet clients via ECDH and Internet client security improvement by using AES-GCM
  • Switch from DSA to ECDSA for client update and server identity signatures

Still to do:

  • Lot’s of testing and bug fixing
  • Backup and restore of file meta-data on Mac OS X and Linux
  • Symbolic link backup handling on Mac OS X and Linux
  • Automatic client update for Mac OS X like for Windows
  • Restoring files which are in use on Windows (via restarting)
  • Update Documentation

Remaining UrBackup limitations (to be done after with a subsequent version):

  • Recognize hard links and backup the files only once
  • Backup only used areas of sparse files
  • Continuous file backup

Once the “to do”s are done this is a big step forward for UrBackup. Having streaming file backups and incremental, differential, synthetic full and full image backups basically allows you to implement pretty much every backup strategy with UrBackup. The only thing missing is the continuous file backup and I already started work on that.

For example you could use UrBackup instead of TimeMachine on Mac OS X and do a full system restore via the file restore feature (this is not implemented at all – it is just an example what it might be able to do). You probably don’t even need an image of your Windows system partition but can restore it via the file backup restore (albeit inefficiently, because the hard links in C:\windows\winsxs are not handled properly).

Using btrfs on Linux to store raw image files in an efficient manner

Currently UrBackup is storing images of volumes as VHD (Virtual HardDisk) files. In UrBackup Server 1.4 the VHD file format was augmented with a custom compression. This combination gives following advantages over storing images as „raw“ files:

  1. Only used parts of the volumes are stored to the VHD files (sparse image)
  2. If the compression is enabled, the volume data is compressed
  3. Incremental image backups: A VHD file can reference another VHD file, so unchanged data does not have to be stored twice (differential image)

All this can be implemented with btrfs. Volume images can then be stored as raw files:

  1. There is a Linux system call to „punch holes“ into a file. Btrfs supports this. Using hole punching only the parts of the raw image file which are used are stored
  2. Btrfs has a compression feature
  3. You can copy a file on btrfs without copying the data the files contain. You can do this in a Linux console via „cp –reflink=always“. A file copied this way only adds another reference to the file data. The file only becomes a “real” copy at changed parts of the file (Copy on Write).

An incremental image backup can then be done like this: Create reflink of the last image backup, change the parts in the image file that have changed since the last image backup and then punch holes into the file to remove unused parts.

As you can see, this moves the implementation of compression, sparse image support and differential image support into btrfs, which is a good thing, because btrfs is starting to get widespread adoption.

The implementation using btrfs has several advantages:

  • The volumes are stored as raw files, which makes inter-operation easier. For example you can directly map the file to a Linux volume device, which makes the file look like a volume on a hard drive.
  • UrBackup can delete any image in the incremental backup chain and btrfs takes care of cleaning up the unused blocks. This makes running infinite incremental image backups viable. After the initial full backup you only need to make incremental image backups

The only disadvantage I saw till now is that UrBraw-copy-on-write-fileackup cannot accurately display the size of incremental image backups anymore. It always shows the size as if it was an uncompressed full image.

This new image file format will be available in UrBackup 1.5 if UrBackup is configured such that it uses btrfs.

Performance considerations for larger UrBackup server instances

If you are planning on setting up a larger UrBackup server instance you will find some hints about performance in this post. “Large” in this context is difficult to define, because it depends on the number of clients, the number of files and file sizes in the backups and the backup intervals.

If you plan on setting up a “larger” UrBackup instance you should keep the following things in mind:

  • UrBackup uses an internal (SQLite) database. This database could cause performance problems, especially with a large number of backed-up files and full file backups. The database should be stored on storage suited for databases.
  • UrBackup has some tuning options, but is nevertheless pretty optimized per default. You should only have to tune UrBackup in special circumstances. You will find information about the tuning options in the administration manual.
  • There are many platform options and each has its own considerations, so you should read up on the platform-specific performance considerations. For example you should not run FreeNAS virtualized.

Your system will almost certainly be IO limited. If you have a running system you can verify that by looking at the performance monitor on Windows, iostat on Linux and zfs iostat on FreeBSD. Often it is limited by random read/write performance (Input/Output operations per second).

If you want maximum IO performance following should therefore be the case:

  • The UrBackup database should be on an SSD. This should be a no-brainer, as this database does not get too large and SSDs are way faster than spinning disks. The random reads/writes are for example 900 times faster with a Samsung 840 Pro (97K IOPS).
  • The UrBackup database should not be on the same disk as the backup storage.
  • The UrBackup database should not be on a RAID-5, as this is not optimal for databases.
  • If the database is still the bottleneck (because it is a separate device you can find this out using iostat or an equivalent), you can use the “file entry cache” (see manual). This cache should be on a separate SSD, otherwise it will only cause more IO on the one device.
  • Save the filesystem metadata of your backup storage on an SSD and only the actual data on a spinning disk RAID-5/6 to get the maximum performance. This is only possible with btrfs on Linux.
  • Avoid full file backups. When doing a full file backup UrBackup has to load all files, calculate their hash value and look this value up in the database. This incurs a lot of IO on both the database and the backup storage (and the client). UrBackup can run an infinite amount of incremental file backups without any full file backups.
  • Optimize the maximum number of concurrent backups such that the throughput is maximized.

Ways to absolutely kill performance:

  • Save the UrBackup database on ZFS/btrfs on a spinning disk. Databases (those who use WAL or intent logging; including SQLite3) on ZFS/btrfs are a known pathological case for copy-on-write file systems. The database files get horribly fragmented. Btrfs has a (currently disabled per default, because it is not stable) background defragmentation for that, but ZFS does not.
  • Save the UrBackup database on a RAID-5. See http://www.baarf.com/.
  • A lot of full file backups.
  • Enable ZFS deduplication without having enough RAM for the dedup table. See here for a discussion.