Btrfs file system stability status update

I wrote before in a blog entry about btrfs stability. That was almost exactly seven years ago.The issue I had back then (about a hard link limit of hard links in the same directory) was fixed a long time ago.

Since then btrfs has come a long way. The most significant development was that SUSE started using btrfs as root file system and put several developers to work on finding and fixing bugs. Simultaneously only few new features were added. This significantly improved stability (at least for the root file system use case).

As just hinted, the main problem is that each developer just looks after their set of use cases. So, developers from Facebook mainly look at their use case of using btrfs on their HDFS storage nodes. They have redundancy etc. on a higher level, so they are probably using btrfs only with single disks and focusing on making btrfs performant and stable within this use case. SuSE, as said, uses it as root file system (system files), where there are single disks or, at most, mirrored disks. File system repair isn’t too important, as those file systems don’t get very large and contain non-unique data.

Compared to that when using UrBackup with btrfs:

  • Storage is for backup and/or long term archival (RAID6 etc. would be appropriate)
  • Can have a lot of threads writing data to the file system at once
  • Can have combination of threads writing lots of small files and large files (image backups)
  • Lots of snapshots (backups) of different sub-volumes, some of which can persist for a long time
  • Deduplication (at least file level)
  • Can’t reboot the server, when it gets stuck like you’d be able to with e.g. a HDFS node
  • UrBackup imposes a database style workload (db for deduplication), where latency is important as well as the backup workload, where throughput is important
  • File system can get very large and it is very time consuming to move data from a damaged (read-only mounted) file system to a new one

This, combined, was guaranteed to cause problems when seriously using btrfs as UrBackup storage. The most persistent issue was premature ENOSPC, where btrfs has an error saying it is out of space, remounting the file system read-only, even though there is still a lot of free space (both for data and metadata) available on the disk(s). The problem seems to be solved (on the systems I observe) with Linux 5.1 or Linux 4.19.x with this patch. RAID56 is still officially “unstable”.

Btrfs isn’t alone with having issues with this work load. When using UrBackup seriously (large number of clients with many simultaneous backups) e.g. with ZFS, I experienced a lot of different issues with ZFSOnLinux (especially in connection with memory/thread management), making ZFSOnLinux unusable. ZFS on FreeBSD was a little better, but there also were issues that occurred about once per week causing hangs.

Btrfs also isn’t the sole area where Linux was needing improvement in the storage area. For example, writeback throttling was significantly improved a few years ago. This example improves the memory management, making Linux more able to handle the mixed database and archival workload mentioned above. Not to say that all errors are fixed. For example, I recently learned that a call to synchronize the file system to make sure that data is actually written to disk does not return errors on Linux. There is now a work-around for that, at least for btrfs, in UrBackup but there hasn’t been a fix on the Linux side in general, yet.

Another important consideration is performance. One thing w.r.t. btrfs to keep in mind is that there is a feature, or better trade-off, where there is a back-reference from each disk data location to each file/item using that disk data location. So, you can ask btrfs to list all files having data at e.g. disk offset 726935388160 and it will do so fast compared to other file systems like EXT, ZFS, NTFS, etc. which would have to scan through all files. Managing the backref metadata comes with the cost of having more metadata in general, though. Operations such as writing new data to files, deleting files, deleting snapshots, etc. become a bit slower as the backref metadata has to be added or deleted, in addition to the (forward) metadata pointing from file to data. Btrfs goes to great lengths making this fast, though (delayed backrefs etc.). There is a number of unique features of btrfs which could be implemented because of the presence of back-references:

  • File system shrinking
  • Device removal
  • Balancing
  • If there is a corruption, it can show all the files (and offsets into files) affected
  • Quotas
  • Btrfs send

The performance problem with backrefs is that if a file has a lot of reflinks or snapshots, it may have a lot of backrefs. Running one of the operations above then involves a lot of iteration through all backrefs, making this operation (unusably) slow. UrBackup naturally creates a lot of snapshots and may reflink certain files many times. The work-around (at least for me) is to avoid the above operations at all cost and patch out the reflink walking in btrfs send.

Conclusion

Should you use btrfs with UrBackup? It is certainly the file system with which UrBackup works the best. Outside of that you’d have to see if you can have:

  • Linux kernel at least 4.19.x (I suggest using the latest LTS kernel). If you have ENOSPC issues with 4.19.x apply patch.
  • Able to avoid above operations such as file system shrinking or device removal
  • Able to avoid btrfs RAID56 (i.e. use RAID1(0) instead or use btrfs in single mode on top of a storage layer that provides integrity such as Ceph or the UrBackup Appliance)
  • I’d suggest using ECC memory, as btrfs repair once it is damaged is mostly not possible

Connect clients with a HTTPS CONNECT web proxy

With 2.4.x you can use UrBackup with a HTTPS proxy. This way you can have the web interface and the clients connecting at the same port, secured by the same transport encryption (SSL). This post shows how to do this in combination with the Apache web server.

The idea is that the client connects to the web server and issues a HTTP CONNECT request to the actual UrBackup server.

First Enable CONNECT proxy module in apache. On debian via

a2enmod proxy_connect  

Then allow connections to the UrBackup server Internet port by adding

AllowConnect 55415

to your apache configuration.

Next in your apache virtual host configuration, set proxy options such as the timeout, allow proxy connections to the UrBackup server, and disallow them to every other host:

ProxyTimeout 600
ProxyRequests On  
<Proxy 127.0.0.1:55415>
</Proxy>
<ProxyMatch ^(?!127.0.0.1:55415$).*$>
    Order Deny,Allow
    Deny from all
</ProxyMatch>

Then, go to your UrBackup server web interface and setup your web server URL as Internet client proxy (https://example.com) and the Internet server name/IP as 127.0.0.1. Internet clients should then start connecting via your web server to your UrBackup server. Once all clients connect this way you could turn off UrBackup’s build in Internet transfer encryption and rely on SSL.

Fixing client IP addresses

You may notice that on the status page all Internet clients now show the IP address of your web server as their IP address. Fixing this is a bit difficult, as there is no standard way to forward the client IP address information from the web server (compared to a normal HTTP proxy where there is a X-Forwarded-For header). So, a bit of hacking to fix this is in order. I modified the mod_proxy_connect apache plugin to forward the client IP information in a 50 byte buffer to the backend: mod_proxy_connect.c
On debian you could replace your original mod_proxy_connect with the modified one via the following commands:

apt install apache2-dev
wget https://gist.githubusercontent.com/uroni/143c0d7ed6169e89f2d6c59a870dd4cc/raw/28dd30b1f82938777c504f2afdc5f162fd91b3fd/mod_proxy_connect.c
apxs -i -a -c mod_proxy_connect.c

Then in the UrBackup server advanced global settings set “List of server IPs (proxys) from which to expect endpoint information (forwarded for) when connecting to Internet service (needs server restart)” to include your web server IP (127.0.0.1 in the example here). After a server restart you should be able to see the actual client IP instead of the web server IP on the status screen.

Fixing SNI errors

If you have multiple virtual hosts with SSL there is an issue with SNI. Apache2 automatically compares the hostname in the CONNECT request with the server name in the SSL connection (SNI) and rejects the request if they differ. The only solution (or ugly hack) I found to fix this was to add the hostname with the target IP to /etc/hosts and then use the hostname instead of the IP in the CONNECT request. I.e., add “127.0.0.1 example.com” to /etc/hosts, then replace 127.0.0.1 with example.com in all the configuration above.

Additional proxy authentication

As additional security layer, one can require proxy authentication. Clients need to know a username+password to get through the web server to the UrBackup server. With apache2 e.g.:

 htpasswd -c -b /etc/apache2/urbackup_password urbackup passw0rd 

Then modify the proxy section to:

<Proxy 127.0.0.1:55415>
    AuthType Basic
    AuthName "Restricted UrBackup"
    AuthBasicProvider file
    AuthUserFile "/etc/apache2/urbackup_password"
    Require user urbackup
</Proxy>

Afterward add username+password to the proxy url, that is e.g. https://urbackup:password@example.com