Linux SSD caching

2 minute read

Overview

I’ve long been a fan of fancy file system features like online resizing, compression, encryption, snapshots, etc. I’ve been running some of my data on btrfs partitions to exploit these features. There is one major feature still missing in my opinion, and that is SSD caching.

In an ideal world, the file system itself would do all of the fancy features such as RAID, SSD caching, and what not. The file system knows which blocks contain valid data and which ones don’t, so it can avoid synchronizing unallocated blocks across multiple drives. Why backup deleted data? If you’re using dmraid or Linux md, the answer is simple: You don’t know any better. Btrfs at least exploits the RAID across multiple disks issues correctly and you don’t need to rely on the device-mapper or lvm2 to help get that stuff done.

When your RAID fails, these block level RAID implementations (including dedicated full HW RAID cards) must copy every block to the new device, even if there isn’t any valid data there. Only using 10 GB on a 10 TB RAID, the RAID doesn’t know this. Sigh. Btrfs however does know this.

Say you have a huge data array of 10 TB of data for things like your media collection and home directories. Now imagine that you hit or access maybe 100GB regularly for things like the dot-files and browser cache in your home directory. Wouldn’t it be nice if you could speed those up automatically with a cache? I think so, and I don’t want to do it manually with multiple file systems and symlinks and all the mess that goes with it. I do that already and hate it. I think zfs has this feature and btrfs has talked about this feature. I don’t like zfs because of the screwy fuse/licensing stuff going on there thanks to Sun/Oracle.

There are several solutions out there driven largely by server demands:

Next Steps?

It appears that dm-cache has been merged into the Linux 3.9 kernel. I’m most likely to trade ease of use (ie mainline) for max performance. Attaining max performance would mean I’d need to spend several days benchmarking my desktop use case (read: browsing the web and building code) to understand what max performance is. I’m likely to go with dm-cache I think. Bcache seems to be second in line in terms of performance and maturity.

Hopefully Ubuntu will release a kernel with CONFIG_DM_CACHE enabled so I don’t have to build and mostly maintain my old builds.

Update

Comments