ostree v2013.6 released

I’ve been working on Free Software for a long time now, on many different projects over the years. In approximate order: Emacs, a variety of things in Debian (PowerPC port, GNOME packaging, GPG verification for apt, CDBS, SELinux porting), then Rhythmbox, then many years of Fedora and Red Hat Enterprise Linux, DBus maintenance, a side stint in web services, and of course my favorite project, GNOME. Which in turn includes working on a lot of infrastructure like systemd, accountsservice, gdm, polkit. A lot of these have been very fun to work on, and mostly useful contributions.

But I think the most valuable contribution to Free Software I will have made will be my most recent project, OSTree. I’ve just released version 2013.6.

The documentation provides an overview in more depth, but briefly here: it’s a tool for parallel installation and atomic upgrades
of general-purpose Linux-kernel based operating systems, and designed to integrate well with a systemd/GNU userspace. You can use it to safely upgrade client machines over plain HTTP, and longer term, underneath package systems (but above filesystem and block storage layer; use whatever you want there).

I have been obsessed with upgrades for a long time, on and off. (Side note: interesting that KSplice does now exist). How OS upgrades work affects everything in the system architecture, from the development process to end system reliablity. It’s a problem domain that spans client machines, cloud deployments, traditional servers, and embedded devices.

Atomic and safe upgrades

Over the past few years, my interest in the domain rekindled for several reasons. One is that I happened to stumble across NixOS. While obviously there are a lot of software deployment mechanisms and such are out there, if you filter by “has atomic and safe upgrades”, the list becomes quite small and most of what’s there is quite specialized. So I studied Nix carefully; the executive summary is that while they have cool ideas, rebuilding or redownloading the entire system for a glibc security update is deeply impractical. But their approach of having a symlink farm which is the target of a bootloader entry stuck in my mind.

Also in the last few years, Chromium OS appeared with its autoupdater. The Chromium OS updater is an extremely efficient design…for their use case. But it’s hard to generalize the design to the wider world; doubling the disk space usage in every cloud image is a rather large penalty. Furthermore, the Chromium OS model doesn’t have much of a story for locally generated systems. If you want to customize the OS, you are completely unable to reuse their updates server, as it is all about deltas between fixed disk images. Again, this all makes sense if your model is that the only apps are web apps, but that’s a very fixed use case.

In short, OSTree is more efficient than Nix in a number of ways, and most importantly only handles filesystem trees; it’s not a package system. I posted plans for approaching the efficiency of the Chromium OS updater on the wire.

But, if OSTree is so cool, why isn’t it powering your package system? The simple answer is because it’s really quite deeply invasive for existing package systems like dpkg/rpm and all the others that are basically just clones of the same idea with different names. This quote from a LWN commenter sums it up:

What OSTree is proposing is somewhat unclear, but appears to require rebooting on *every single package upgrade* so as to switch into a new chroot containing that package. That means several times a day for me. Not a bloody chance am I letting something like *that* near any of my systems outside a VM: it is transparently ridiculous and optimizing for a very few programs that might need to take extra measures to avoid being broken by updates happening underneath them…

Right. On the plus side, you get atomic upgrades, and this is a tradeoff that a substantial number of people would likely take. Ultimately of course, as I replied to the commenter, it’s certainly possible to imagine carefully engineering the OS so that a certain subset of changes can be “live applied”, while still preserving atomic upgrades. Furthermore, while OSTree does not come with or force any particular independent application installation mechanism, it is designed to provide a fundamental layer for existing and new ones.

Parallel installation, the OS development process, and system quality

The core of OSTree is so simple – it’s just booting into hardlinked chroots – that it was relatively easy to enable something else besides atomic upgrades, which is easy parallel installation of operating systems. Not only does it make it easy to dual boot say a stable OS and the bleeding edge, if you have the disk space, you can thousand-boot, or more.

Why is this so critical? It’s because while package systems have a lot of flexibility, there’s one extremely important gap: The ability to try new code, and go back if it doesn’t work. This was covered in my original GUADEC presentation. Typically package-based distributions manage this by creating several different layers. Debian has stable, testing, unstable, and experimental. But if you upgrade from stable to unstable to see if suspend works for example, the package system will fight you trying to downgrade; the concept of “newer is better” is baked deeply into dpkg/rpm and everything built on top.

Being a software engineer working on an extremely complex general-purpose system like GNOME without massive development resources, let me tell you – it’s easy to break things unintentionally. Having a subset of users (but not everyone) that run the bleeding edge, like Firefox has in Nightly would be a real benefit, while also giving them a mechanism to fall back to the previous working build. And in fact, I have a separate project gnome-ostree that’s intended to be exactly that. Although it has an uninspired name, it’s better than “nightly” – it’s fully continuous, updated easily 70 times a day as git commits are made. But while it serves as an important testing base for validating the core OSTree designs in a relatively constrained scenario, it’s a separate project, and not the topic of this blog post.

OSTree underneath package systems

There are a large number of systems which fit into the model of efficiently replicating pre-constructed OS trees from a build server; many basic “client” workloads as well as cloud deployments are best delivered this way. That said, the “package” model where filesystem trees are computed dynamically on individual machines is very flexible, and some of that flexibility is entirely valid. Particularly for organizations which have invested heavily in it, it doesn’t make sense to toss out that investment; I want to support it.

While I’ve been relatively quiet about OSTree so far, I think it’s finally reached a point in implementation quality and design where I’d like to see more package system maintainers and distributions attempt to experiment with it; that’s the goal of this blog post. A quick weekend hack a while ago resulted in fedora-ostree. Since then, I worked on it a bit more this weekend, and updated it.

This is a long term effort; as the LWN commenter above said, OSTree has wildly different tradeoffs from existing package system semantics. There is a new section of the OSTree manual describing changes that many existing general-purpose distributions will have to make to adapt.

And clearly, hashing out a design where some changes can be applied live (after they are atomically set up for the next boot) would be really nice. If you’re logged into a system and want to zypper/yum/apt-get install strace, there’s no reason since that’s just a new file in /usr/bin that we can’t just make it appear right away. But as you go up from there in complexity, it gets more difficult to do without race conditions. But luckily, we have the complete source code to the operating system; and starting from a fundamental basis of reliability and safety, it is much easier to add features like speed and flexibility.

If you too share my passion for atomic upgrades, operating system upgrade engineering, continuous integration and such, then check out the git repository and join the mailing list; it’s a great time to join the project, as there are several new contributors, and it’s just fun to work on!

Bookmark the permalink.

17 comments

Pingback: OSTree 2013.6 released | Linux-Support.com
some human says:|

August 26, 2013 at 3:56 pm

I dont see how this added complexity adds value over ZFS snapshots.

And how is it atomic, if I have to reboot? Its just immutabe ( also not a new concept -> ro mount)

zfs send does the rest.

In the freebsd bootloader I can even select from which snapshot to boot.

Also see solaris live upgrade with zfs.

Reply
- Colin Walters says:|
  
  August 26, 2013 at 4:17 pm
  
  The closest analogue in the Solaris world looks like “beadm”: http://docs.oracle.com/cd/E26502_01/html/E29052/snap3.html#scrolltoc
  
  It looks like if you do things a bit by hand, you can get the OSTree semantics where updates only take effect on the next boot, although it’s not clear to me whether boot environment swaps are actually atomic.
  
  I think doing “package management” on a live system works for simple cases, has race conditions for less trivial cases, and is full of failure for large transitions, particularly if you have an enormous package set that can be installed in a huge matrix of combinations and configurations.
  
  Solaris + ZFS snapshots appears to default to do “save snapshot, then perform live manipulation”. I don’t have any data; perhaps IPS mitigates this type of thing? Or perhaps the package set is smaller. Or maybe it is buggy in various situations, and people just accept it 😉 Again, I don’t know, though I’d be interested in links to actual data.
  
  A secondary answer here is that if you are willing to take a hard dependency on BTRFS/ZFS, don’t care about the “inplace updates” problem and/or have mitigations, then OSTree may indeed not be for you! Sometimes, like the Chromium case, if you have the resources, it makes sense to engineer and maintain targeted solutions.
  
  But I do think OSTree is a good “generic” solution, and over time we’ll likely add support for deployments which want to tie themselves to a particular block layer like BTRFS.
  
  Reply
  - some human says:|
    
    August 27, 2013 at 3:40 am
    
    Hi! Thanks for the reply.
    
    beadm ( I did not know it before, I mainly use FreeBSD) does indeed looks exactly a tool implementing the workflow I described.
    
    I have as well great doubt about all this in-place magic happening in updates.
    
    After cursory reading of the OSTree Docs I just wondered about the distinction to the filesystem-snapshot approach. Since OSTree seems to involve some elaborate techniques.
    
    The part-time sysadmin in me just groaned at the sight of even more complexity to juggle.
    
    Maybe a section about the similatities and differences to other approaches/implementations would be really helpful.
    
    In any case, impressive work!
  - some human says:|
    
    August 30, 2013 at 1:36 am
    
    I think you didnt get an important point of the snapshot upgrade workflow.
    
    1. Clone system FS
    2. Perform package upgrade in cloned fs
    3. Boot using new clone ( what you call atomic)
    
    4. On error boot into original fs
    
    — with zfs a clone is essentially free
    
    In the related projects this is misrepresented.
James Cameron says:|

August 27, 2013 at 2:16 am

We use a symlink farm as the target of the bootloader on the OLPC XO, for atomic upgrades. Our upgrade utility is olpc-update.

Reply
- Colin Walters says:|
  
  August 27, 2013 at 2:03 pm
  
  Yes! I should mention olpc-update somewhere. I had forgotten about it when I started this project – it’s certainly been in this space a lot longer. The main differences are:
  
  * olpc-update uses rsync, whereas OSTree only comes with built-in code to replicate from static webservers over HTTP. rsync requires more server resources, whereas OSTree is at present probably less efficient on the wire, but it will become a *lot* more efficient than rsync when I finish “static deltas”. (At the cost of more storage space on the server, but that’s cheap nowadays. Requiring CPU computation per client means you can’t use Amazon S3 type storage sytsems).
  
  * OSTree is designed to parallel install even completely independent operating systems with their own copy of /var, whereas olpc-update is just about upgrading olpc.
  
  * I have the goal of putting OSTree underneath package systems; it has a shared library API and such to enable that, whereas oplc-update just has /versions/number.
  
  Reply
  - puerexmachina says:|
    
    August 29, 2013 at 3:12 pm
    
    Have you looked at zsync for the deltas?
    
    http://zsync.moria.org.uk/
jb says:|

August 27, 2013 at 6:35 am

There seems to be some buzz around CoreOS ( http://coreos.com/ ), do you have any opinions about that and how it compares to OSTree? In short, CoreOS seems to be a minimal distro, basically only kernel + systemd, and then you run everything else in containers, managed by docker ( http://www.docker.io/ ). It has two separate root filesystems, you switch between them when updating so you get atomic updates; apparently this has been lifted from Chromium. The containers themselves are not part of this dual root fs thing (so the root fs itself need not be large), apparently the idea is to update containers by preparing a new one, shutting down the old one(s) and launching instances of the new version?

My brief, and admittedly very ignorant, view is that it could be a very nice thing if you’re google, facebook or something like that, and you need to deploy some application on a zillion servers. But for something like a desktop, probably not that useful? If you run, say, ubuntu in a container you’re not really gaining that much vs. just running ubuntu on bare metal.. Or am I missing something?

Reply
- Colin Walters says:|
  
  August 27, 2013 at 12:02 pm
  
  It is funny that CoreOS reuses the Chromium update model because that was one I said wouldn’t work well for cloud, but it does actually make sense for them, because they’re intended to be a “thin” OS and everything is delivered as docker containers. Just like for Chromium OS, the local OS is only there to run the web browser.
  
  I still believe it makes sense to ship flexible “thick” operating systems, like current Debian and Red Hat Enterprise Linux. The vision I have here is that OSTree provides an enabling layer for these package-based systems to have atomic upgrades and parallel installation, while ideally keeping their package systems mostly intact.
  
  This dicussion is a bit complicated, but basically OSTree is a middle ground between an inflexible system like CoreOS, and the incredible flexibilty of package systems.
  
  Reply
Colin Walters says:|

August 27, 2013 at 11:48 am

There is a lot of interesting discussion over on Hacker News: https://news.ycombinator.com/item?id=6277518

Reply
Pingback: Open Source Pixels » OSTree 2013.6 released
Pingback: Выпуск OSTree 2013.6, инструмента для организации обновления системы в стиле Git | AllUNIX.ru — Всероссийский портал о UNIX-системах
Trebor says:|

August 27, 2013 at 10:17 pm

Have you seen:

“http://cernvm.cern.ch/portal/sites/cernvm.cern.ch/files/cvmfstech-2.1-4.pdf”

and could you comment on it?

Reply
- Colin Walters says:|
  
  August 27, 2013 at 10:28 pm
  
  I hadn’t! I really need to have a centralized comparison table; there’s an out of date one here https://wiki.gnome.org/OSTree/RelatedProjects
  
  So…my exective summary on OSTree versus CernVM-FS is that OSTree is entirely static; it lays out complete filesystems up front, whereas CernVM-FS does dynamic HTTP requests with caching. I’d say that OSTree makes sense to deploy at least enough of a userspace to run the basic node (i.e. libfuse.so is in /usr/lib and versioned with OSTree), and then you use CernVM-FS for say /usr/local (or in their example, /cvmfs), where all of your custom code lives.
  
  Reply
Pingback: Проект OSTree развивает средства для обновления системы из хранилища, похожего на Git | AllUNIX.ru — Всероссийский портал о UNIX-системах
Pingback: DevConf.cz, days 1 and 2. » The Grand Fallacy