Still on Github

Over 4 years ago now, I wrote about moving ostree to Github, and I wanted to add an update here. I still think it was the right move.

Free Software is important to me – but I think Github overall provides a lot more benefit to FOSS than harm from its mostly proprietary nature. Providing a zero-cost mostly reliable featureful platform (also with various zero-cost CI available) is a huge accelerant to all the FOSS projects that use it. And whenever I have to try to contribute a patch via email that has no CI checking I sometimes just want to throw up my hands and move on.

But for the people who don’t agree with me and think Free Software needs free tools – I say awesome. I am very glad you exist, and really there’s about 20% of me that also agrees. That part of me is happy when I come across projects hosted in e.g. Gitlab.com at least. It’s obviously good for there to be some diversity and competition, beyond the fact that Gitlab is at least at the core FOSS. I also hope at some point somehow pagure’s model of storing issues and PR comments in Git takes off too. Or maybe it’ll be something like Radicle.

Anyways, that’s really all there is to say – I continue to use Github for those reasons but I’m happy to see new tooling that might also win in the future. Or just cool developments in existing tools. My goal here is just to have these current thoughts written down so I can link to it in various places.

Committed to the integrity of your root filesystem

Quite a while ago I came across the SQLite testing page and was impressed (and since then it’s gotten even better). They’ve clearly invested a lot in it, and I think SQLite’s ubiquity is well deserved.

When I started the ostree project I had this in mind but…testing is hard. We have decent "unit test style" coverage since the start but that’s not very "real world". We’ve gone through a few test frameworks over the years. But to the point of this blog post: I finally had a chance to write some new testing code and I’m happy with how it turned out!

TL;DR: There’s a new "transactionality" test run on every PR that uses a mix of e.g. kill -9 ostree and reboot -ff while updates are running, and verifies that you either have the old or new system safely. (PRs: ostree#2048 and ostree#2127).

But along the way there were some interesting twists.

Test frameworks and rebooting

I mentioned we’d been through a few test frameworks. An important thing to me is that ostree is a distribution-independent project; it’s used by a variety of systems today. Ideally, our tests can be run in multiple frameworks used by different distributions. That works easily for our "unit tests" of course, same as it does for many other projects (make check style tests that are nondestructive and run as non-root).

But our OSTree tests want a "real" system (usually a VM), and further the most interesting tests need to be destructive. More than that, we need to support rebooting the system under test.

I’d known about the Debian autopkgtest specification for a while, and when I was looking at testing I re-evaluated it. There are some things that are very Debian-specific (how tests are defined in the metadata), but in particular I really liked how it supports reboots.

There’s a big tension in test systems like this – is the test logic primarily run on the "system under test", or is it on some external system which manages the target via e.g. ssh? We had lots of problems in our prior test frameworks was dealing with reboots with the latter style. Plus the latter style tends to strongly tie the test code to the test harness.

In the Fedora CoreOS group we use a system called "kola" which came from the original CoreOS project. It knows how to boot systems using Ignition in various clouds along with qemu. I added partial support for the Debian Autopkgtest specification to it (cosa#1528).

Avoiding shell script

A lot of the original ostree tests are in shell script. I keep finding myself writing shell even though I also keep being badly burned by it from time to time.

So another tangent along the way here: For writing new tests I’d resolved to use "not shell script". Python would be an obvious choice but…another large wrinkle here is that in CoreOS we don’t want interpreters in the base OS – they should run as containers (yes, a shell is obviously an interpreter too but…). So going the interpreted test route would drive us towards having our test framework run as a privileged container. I decided not to do this for a few reasons; the biggest is that makes it much harder to test the system as other processes see it.

My preferred language nowadays is Rust, and it generates static-except-libc binaries that we can just copy to the host. Further, fortuitously someone else created Rust bindings to ostree and I’d been wanting an excuse to use that for a while too! However…some things are just too verbose via API, and plus we want to test the CLI too. Invoking subprocesses via Rust std::process::Command is also very verbose. So I ended up creating a sh-inline crate for Rust that makes it ergonomic to include snippets of strict mode bash in the code. This snippet is a good example. I’d like to make this even more ergonomic too, but my proc-macro-fu isn’t there yet.

Actually writing the test

OK so all those prerequisites out of the way, the first thing I did was write the code to do the "try upgrading and while that’s running, kill -9 it". That went reasonably quickly and worked well, so I moved on to the more interesting case of adding reboot -ff (simulating immediate power loss) as another "interrupt strategy". This excercises the whole stack through the kernel, particularly interactions with the filesystem.

However, this required completely rewriting the control flow because here the "test harness" is also being forcibly killed. We don’t want to rely on persisting our state to the disk on the system. I ended up serializing the process state into AUTOPKGTEST_REBOOT_MARK, which gets stored in the harness and passed back when the process starts again. Effectively then the test code becomes a sort of coroutine with the harness.

Found problems

Depending on how you look at it, fortunately or unfortunately: none so far. One motivation for writing this test was to try to reproduce a bug a user filed that showed an error message from the boot loader configuration handling code. I haven’t managed to reproduce that yet. I did manually inject some faults in the code and verify that the test failed of course. And in the past I’ve of course done some manual testing to verify that ostree does what it says on the box for implementing transactional upgrades. But there’s clearly more to explore here.

Next steps

One thing I plan to explore next here is fault injection, probably with strace fault injection. This may also combine well with adding support for the harness to request explicit sleep() calls to widen the window on possible races. Plus so far while I’ve mentioned support for other distributions, this is only testing Fedora CoreOS in its default mode; e.g. we’re only validating xfs and not other Linux filesystems, etc.

Are we testing like SQLite yet?

Definitely not, but I’m happy that I made some progress closer to that goal! It was an interesting project and I’m looking forward to building more of it per above. Outside of OSTree, the goal of this blog was write down some of the "lessons learned" for others working in this space. For example, I hope some people working in the Linux-based OS testing space look at the Debian autopkgtest; it can be hard to come to consensus on test frameworks and standards, but there are at least some good ideas there. Also I think the mix of "Rust with some inline shell script" worked pretty well for these types of tests; particularly if the CLI outputs JSON, deserializing with Serde is great. Though taking the Rust compile time hit for tests is a downside.

But in the end, I can at least now say that every pull request to OSTree runs through a test suite that ensures it survives being forcibly terminated while an update is running. The integrity of your root filesystem is very important to me – it should be robust and image-like, but still a Linux system in the end. If this sounds good to you, I hope you check out one of the distributions that use it!