There’s a lot of terminology we tend to use in the Free Software community, but we lack any kind of widely accepted dictionary for our “industry jargon”. Wikipedia has pages on some of this, but Wikipedia isn’t the same thing as a dictionary.
Anyways, I want to attempt a definition for “upstream”:
upstream(n): A FOSS project with an active and robust peer-review process.
I rely here on the definition of both “FOSS” and “project”. The wikipedia page for FOSS is a good enough substitute for a dictionary entry, and let’s ignore for now the possible meanings of “project” here. The emphasis in my definition is on “active and robust peer-review process”. Why is that?
Because basically, without peer review, there’s no interesting difference between say a Debian “package” (what many people seem to consider “downstream”) and a git repository on Sourceforge (what people consider “upstream”). There’s no point saying “push this change upstream” if that just means it gets added to a git repository without robust inspection. All that happened was some bytes got copied across the Internet from point A to point B.
1) What constitutes a “robust” peer-review process could be subject to a lot of debate.
2) If a distro has an active and robust peer-review process for its patches (many do), then it’s effectively an upstream of anything it packages.
I agree with the sentiment that a “drop your crap here” isn’t a useful upstream, especially with increased use of distributed VCS. But the definition as-is doesn’t capture the essense of what people mean when they say “upstream”.
How about ‘A source code repository or project that is used by users and distributors as the canonical source for a given software package’? I don’t think having a robust review process is the most important characteristic for many upstreams, although it is indeed for most of the successful, serious and relevant ones.
agreed, ‘canonical source’ is the main attribute
The Jargon File (at http://catb.org/jargon/html/ for the HTML version) is such a dictionary (a great resource, if a bit bitrotted).
It does have a definition for the adjective form of upstream (see http://catb.org/jargon/html/U/upstream.html for that).
I always considered “upstream” to be a community, not code location. When I submit something “upstream” to debian (I’m on Ubuntu), I don’t consider it putting the code into the debian package, I consider it giving the code to the debian community (who just happen to prefer that I leave it for them in a package).
I actually find your use of “active” much more interesting than “robust” in regards to the peer-review process.
There are too many inactive “upstream” projects which are still relevant. Does your definition try to solve that by making the “upstream” whoever actively maintains a patch set on top of that (eg. Debian packager)? Or is this just coincidence?
As far as the rest of the definition, I think you are missing the “canonical source” bits Gustavo brings up.
But, there is merit in saying something like: Debian is upstream for Ubuntu, where Gnome is upstream for both, and Apache is eg. direct upstream for Debian, but a transitive one for Ubuntu (through Debian).
In other words, maybe “upstream of Apache” should not be a term specifically referring to apache.org, which your definition allows. Perhaps we don’t want to use “upstream” for the “canonical source”, since we can just say “original source”.
Even still, there’ll always be a lot of middle ground to cover 🙂
If there was never a real peer review process for the original code, and the person maintaining the Debian package is just adding patches without review too, then *neither* are upstreams by my definition. They are just code repositories.
Look at it this way – i’m arguing that “upstream” refers to purity/cleanliness. Compare with drinking water. You dont want to be downstream of a consumer who is doing whatever they want to the water.
I understand your point, but making that part of the definition of “upstream” seems confusing. If you have a garbage dump upstream from you in a river, you don’t stop calling it upstream, you just call it a bad upstream. Similarly, with software, an upstream that doesn’t apply some filtering and taste represents a bad upstream. So does an upstream that applies so much filtering and “taste” that they never accept anything.
Upstream is wherever you’re getting your code from. Whether it’s F/OSS is entirely irrelevant. Is NVIDIA not the ‘upstream’ for the nvidia graphics driver?
For me, the important part about “upstream” is that the code is shared with others.
If the debian branch has more fixes and features than the original project, I will still not consider it “upstream” if other distributions are still using the original project.
So for me, upstream is where most entities (distributions, users) will get their code from. It does not matter how it is managed or anything.
It’s IMO at least missing the fact that upstreams are only upstreams for something/someone else. I don’t think KDE talks about GNOME as upstream.
And if you do that, you have to define the relationship between the parties. And then you can go on to say what constitutes an upstream from a different part of that relationship.
Let’s try with a “picture”:
Linus Torvalds’ Linux Kernel => Fedora kernel => Some other Fedora based distribution’s kernel.
Leftmost: canonical source.
Moving away from the canonical source: moving downstream.
Moving towards the canonical source: moving upstream.
Imagine the rightmost project being reported a kernel bug, and that distro’s developer’s fixing it. They’d want to push the fix upstream, in order to not have to maintain it themselves indefinitelly. That could mean pushing to Fedora, or to Linus directly. Both would be upstream. In most cases, you only have one “upstream” from you.
It should now be obvious that “upstream/downstream” is an analogy to the flow of a river’s stream. Upstream simply indicates direction towards the headwaters.
Don’t try to redefine “upstream” in this way. “Upstream” refers to the code repository that is shared between most or all GNU/Linux distributions, in many cases with BSD distributions as well, and that might also be used directly by end users trying to get the latest and greatest code. For the software to be any good it needs to be actively maintained, but even if it is not, and even if someone closely associated with one distro is doing all the work with no further review, what makes it “upstream” is that the work is shared by many distros.
The problem with using quality in the definition of “upstream” is that it tends to vary over time (as the contributors from the upstream project come and go).
@hobophobe, thanks for pointing to the jargon file entry. I didn’t know it existed and, having read it, see very little to add to it.
I also think that “upstream” is where you get your stuff from. This can be tarballs if you are a distro, or a widget library if you are an application.
Your proposal seem to be about what qualities a good upstream should have.