In GNOME, for various reasons (mostly historical), as part of the release process we still take our git repositories and run autoconf/automake on developer machines, and upload the result to the FTP server. One question I had today is – how many times do I as a developer need to download separate versions as a tarball before it would have been more efficient to just download the entire history as a git repository?
The answer to this obviously varies per repository. It’ll be a function of variables such as the length of the history of the module, whether or not it has large static assets (e.g. png images), etc. Let’s take a module I maintain, gobject-introspection. It has a nontrivial history, dating back to 2005, and has seen periods of peak activity, then has been fairly constant after that.
What we want to compare here is the size of tarballs to the size of the packfile that git will serve us. Here’s how we do it:
$ ls -al gobject-introspection-1.33.2.tar.xz -rw-rw-r--. 1 walters walters 1.1M Jun 5 11:58 gobject-introspection-1.33.2.tar.xz $ git repack -a -d Counting objects: 18501, done. Delta compression using up to 4 threads. Compressing objects: 100% (3499/3499), done. Writing objects: 100% (18501/18501), done. Total 18501 (delta 14971), reused 18501 (delta 14971) $ du -sh .git 7.8M .git
This means that for gobject-introspection, if I end up downloading the source code more than 7 times in tarball form (xz compressed), it would have been more efficient to download git instead. How about gtk+, which has a significantly longer history? The answer there is 16 times (current tarball is 13M, git repository is 213M). How about gnome-documents, which has a much shorter revision control history? Just 3 times!
A naive source code storage system based on tarballs would keep all of them as separate files, so what we’ve looked at above for network transmission also would apply in that case to at-rest storage. Anyways, just some data…