Btrfs file system stability status update

I wrote before in a blog entry about btrfs stability. That was almost exactly seven years ago.The issue I had back then (about a hard link limit of hard links in the same directory) was fixed a long time ago.

Since then btrfs has come a long way. The most significant development was that SUSE started using btrfs as root file system and put several developers to work on finding and fixing bugs. Simultaneously only few new features were added. This significantly improved stability (at least for the root file system use case).

As just hinted, the main problem is that each developer just looks after their set of use cases. So, developers from Facebook mainly look at their use case of using btrfs on their HDFS storage nodes. They have redundancy etc. on a higher level, so they are probably using btrfs only with single disks and focusing on making btrfs performant and stable within this use case. SuSE, as said, uses it as root file system (system files), where there are single disks or, at most, mirrored disks. File system repair isn’t too important, as those file systems don’t get very large and contain non-unique data.

Compared to that when using UrBackup with btrfs:

  • Storage is for backup and/or long term archival (RAID6 etc. would be appropriate)
  • Can have a lot of threads writing data to the file system at once
  • Can have combination of threads writing lots of small files and large files (image backups)
  • Lots of snapshots (backups) of different sub-volumes, some of which can persist for a long time
  • Deduplication (at least file level)
  • Can’t reboot the server, when it gets stuck like you’d be able to with e.g. a HDFS node
  • UrBackup imposes a database style workload (db for deduplication), where latency is important as well as the backup workload, where throughput is important
  • File system can get very large and it is very time consuming to move data from a damaged (read-only mounted) file system to a new one

This, combined, was guaranteed to cause problems when seriously using btrfs as UrBackup storage. The most persistent issue was premature ENOSPC, where btrfs has an error saying it is out of space, remounting the file system read-only, even though there is still a lot of free space (both for data and metadata) available on the disk(s). The problem seems to be solved (on the systems I observe) with Linux 5.1 or Linux 4.19.x with this patch. RAID56 is still officially “unstable”.

Btrfs isn’t alone with having issues with this work load. When using UrBackup seriously (large number of clients with many simultaneous backups) e.g. with ZFS, I experienced a lot of different issues with ZFSOnLinux (especially in connection with memory/thread management), making ZFSOnLinux unusable. ZFS on FreeBSD was a little better, but there also were issues that occurred about once per week causing hangs.

Btrfs also isn’t the sole area where Linux was needing improvement in the storage area. For example, writeback throttling was significantly improved a few years ago. This example improves the memory management, making Linux more able to handle the mixed database and archival workload mentioned above. Not to say that all errors are fixed. For example, I recently learned that a call to synchronize the file system to make sure that data is actually written to disk does not return errors on Linux. There is now a work-around for that, at least for btrfs, in UrBackup but there hasn’t been a fix on the Linux side in general, yet.

Another important consideration is performance. One thing w.r.t. btrfs to keep in mind is that there is a feature, or better trade-off, where there is a back-reference from each disk data location to each file/item using that disk data location. So, you can ask btrfs to list all files having data at e.g. disk offset 726935388160 and it will do so fast compared to other file systems like EXT, ZFS, NTFS, etc. which would have to scan through all files. Managing the backref metadata comes with the cost of having more metadata in general, though. Operations such as writing new data to files, deleting files, deleting snapshots, etc. become a bit slower as the backref metadata has to be added or deleted, in addition to the (forward) metadata pointing from file to data. Btrfs goes to great lengths making this fast, though (delayed backrefs etc.). There is a number of unique features of btrfs which could be implemented because of the presence of back-references:

  • File system shrinking
  • Device removal
  • Balancing
  • If there is a corruption, it can show all the files (and offsets into files) affected
  • Quotas
  • Btrfs send

The performance problem with backrefs is that if a file has a lot of reflinks or snapshots, it may have a lot of backrefs. Running one of the operations above then involves a lot of iteration through all backrefs, making this operation (unusably) slow. UrBackup naturally creates a lot of snapshots and may reflink certain files many times. The work-around (at least for me) is to avoid the above operations at all cost and patch out the reflink walking in btrfs send.

Conclusion

Should you use btrfs with UrBackup? It is certainly the file system with which UrBackup works the best. Outside of that you’d have to see if you can have:

  • Linux kernel at least 4.19.x (I suggest using the latest LTS kernel). If you have ENOSPC issues with 4.19.x apply patch.
  • Able to avoid above operations such as file system shrinking or device removal
  • Able to avoid btrfs RAID56 (i.e. use RAID1(0) instead or use btrfs in single mode on top of a storage layer that provides integrity such as Ceph or the UrBackup Appliance)
  • I’d suggest using ECC memory, as btrfs repair once it is damaged is mostly not possible