On the topic of release quality

This post was inspired by Montys post about MySQL release 5.1, and the many discussions it has created both around the web and offline. The question mainly being - why has nobody made such a post about PostgreSQL yet? Is it because it hasn't happened, or just because nobody has posted about it.

We all know that there is no such thing as bug-free software. This obviously includes both PostgreSQL and MySQL, as well as all the commercial competitors - claiming anything else is clearly untrue. But what I find remarkable from Montys post are mainly:

  • MySQL 5.1 has been released with known critical bugs (crash/wrong result). As far as I know, this has never been done with PostgreSQL (at least if we're talking "modern times", since after the product became reasonably stable). And certainly not the number of issues that Monty has listed for MySQL 5.1 - it's not just one or two!
  • There are also critical bugs that were present in 5.0, that still haven't been fixed in 5.1.
  • We already know that MySQL 5.0 was released "too early". We already know that MySQL 5.1 was declared RC "too early". It's remarkable that 5.1 was also released "too early" in that case - and for non-technical reasons again. In theory, this "can't happen" with the PostgreSQL release model, since it's based only on when the features are "ready", not when you need a new release for some other reasons. That's in theory. I know in the past we have tried to schedule releases around certain conferences and such, for better announcement effects. In practice, though, I think this has only led to a release being postponed, never being rushed.
  • MySQL apparantly keep some bug reports hidden from the public (the referenced bug 37936 for example). How's that for open... I approve of keeping them hidden for security bugs only - but if that bug is a security bug, it's clearly taken way too long to fix, given the dates on bugs around it.
  • I still think they've designed their version numbering system to deliberately confuse the customers about what is a beta, what is a release candidate and what is a release. What is so easy with either labeling them as PostgreSQL does (8.3beta, 8.3RC, 8.3.0), or using the in open source popular system of using even-numbered releases for stable releases and odd numbers for beta/testing releases?
  • It took them over a year to get from Release Candidate to release.

Now, there are several posts I found that are questioning Montys post, saying that the quality is just fine - and backing this up with actual experiences in deploying 5.1. I do think both sides are right here - it's perfectly possible to deploy 5.1 without hitting these bugs, as they are "corner-case" issues. But that does not decrease the importance of having releases without known bugs in them. And if there are known bugs, they should at least be listed very clearly in the release notes/announcement. Not doing this is, IMHO, simply irresponsible. Especially least for a database server which is supposed to safeguard all your work...

So how does PostgreSQL measure up

So how does PostgreSQL measure up in all this? Well, we aim to never release with known critical bugs in the software. But that all depends on the definition of "known". For example, PostgreSQL does not have a bug-tracker. For those following our lists, you will know that there is an endless debate started whenever somebody tries to discuss that, so I'm not going to discuss the topic in general. I am personally on the side that thinks we should have one. But I also feel that the MySQL bug-tracker is a clear example of the kind that we do not want to have. I don't have any exceedingly high thoughts about the bug-freeness of MySQL, but I have a hard time thinking that all the 41,000+ entries in their tracker are actually bugs. So it represents exactly the part that I really dislike about a lot of the bug-trackers around for open source projects - it's full of junk that aren't bugs, which makes it very hard to find what actually are bugs. (so thanks to Monty for outlining some for us :-P)

That means we can't use "bugs in the bug-tracker" as a metric. PostgreSQL does keep an "open items list" before each release, which does track things that came up during the development. It used to just be a static file on Bruce Momjians machine - now it's in the wiki. It's a lot less structured, but it does give us a lot of help in the let's-not-forget-this-bug-before-release department. But it doesn't track bugs past release, and the history is not preserved across versions (that I'm aware of at least).

Another metric would be to look at what actually changes in the minor versions. If a full release is not reasonably bug-free, there will be a lot of turnaround after the release, once people put it in production. We are, as many open source projects, lucky in that we have a lot of people running our betas and release candidates in near-production environments, which means we can iron out a lot of things before release. But it's obvious that a lot more people will deploy a product once it's released (in fact, that has been quoted as one of the reasons why MySQL 5.1 hit RC so early - "to get more people testing it"). So I decided to pull some statistics from our CVS repository about the latest four major PostgreSQL releases (8.0 to 8.3), and the turnaround on their branches after commit.

http://www.smugmug.com/photos/428148379_hX2qs-O.jpg Update: Since several people asked: the X axis is "minor version number"

This first graph shows the number of commits on the back-branches. This has been somewhat filtered - I have removed for example commits that are just updates of translations, and the commits that stamp the release itself. Or rather - I have tried to, with some simple regexp matching. So don't take these numbers as absolute, but they should be good enough to show trends. Also note that due to some problems with cvs not wanting to be nice to me, there are no commit statistics for 8.0.1, that's why the blue bar is missing in the first one.

A couple of things I think we can read from this graph. First of all, the total number of commits in the back-branches after a release are very low. Only very few of the branches have more than 50 commits going in. And this includes things that are just whitespace fixes or comment fixes, I didn't manually look through things to determine what were actual code fixes. It also "kind of" shows that 8.1 had more patches going in early on - which aligns with the general consensus that 8.1 was not one of our best releases. 8.3 had quite a lot of patches going into the first two releases, which can probably be attributed to several of the new features being very complex and thus obviously exposed to corner-case bugs. And there is (whew, lucky!) a steady decline in the number of fixes as the versions gets older - both because there are less people using them and finding bugs, and obviously because there are less bugs left in them.

So, moving on:

http://www.smugmug.com/photos/428148260_667SB-O.jpg Update: Since several people asked: the X axis is "minor version number"

This second graph shows the number of lines added and removed. I used a simple diffstat output and probably ran diff in the wrong mode first, so it all came out as inserts + deletes. Thus, any changed row actually counts as two in the graph. The true number of modified rows is therefore much lower than the ones indicated in the graph - but not necessarily half, since there are still pure additions and removals happened.

The first thing that stands out from this graph, is 8.1.1 having almost 10,000 lines changed. The reason for this, after trolling the cvs logs for a while, is that pgindent was re-run after release. This is the tool we use to make sure indenting and code formatting is consistent across the source code. Obviously, this generates a lot of changes, all of which are whitespace only. We normally don't run this after release, because it tends to make it harder to track history of the files. But for some reason that I can't remember (I'm sure it can be found in the list archives if somebody cares enough to go digging), we did it for 8.1.

In the 8.0 series, 8.0.2 stands out as having a lot of changes. This was the release where we pulled out the ARC cache method (for several reasons) and replaced it with a completely new one.

8.3 overall shows quite a lot of changes in the early releases, with a large drop for 8.3.5. But it's too early to tell if that's a permanent drop yet. But even "quite a lot of changes" is only just over 2,000 lines of code, which is less than 0.3%25 of the code base (code base size according to Ohloh, and not counting the effect of the add/remove for change as mentioned above).

I don't know the MySQL code base enough to make similar graphs for them - but would be very interested in seeing them if somebody else did!

Finally, another metric:

http://www.smugmug.com/photos/428159720_kpdox-X3.jpg

This is the number of lines modified between the latest release candidate and the release. This is essentially the number of lines that go into the release without being tested outside the developers. Before you get all excited about how large this number is - looking through the actual changes, very much of it is actually changes to error messages, comments or whitespace.

The end!

Yeah, that turned out to be a very long post. If there are more statistics around the code base that you think would be interested, let me know! I have some of this stuff loaded in a database now, so I can do some more analysis without too much work as long as it fits within that model.


Comments

the reason for the sharp drop in 8.3.5 can be explained by the rather short time between the 8.3.4 and the 8.3.5 release due to the GIST-Index related bug found after 8.3.4 was released. Maybe it would also be interesting to graph the number of commits to all branches over time as in commits/month or such.

Posted on Dec 1, 2008 at 18:25 by Stefan Kaltenbrunner.

A very interesting post.

"As far as I know, this has never been done with PostgreSQL (at least if we're talking "modern times", since after the product became reasonably stable)."

There is one issue that has been a bug in PostgreSQL for a long time. If you do something like "UPDATE foo SET i = i + 1" and "i" has a unique index on it, the statement will probably fail.

I still agree with your point, of course. I think this is a case of "the exception proves the rule". The fact that it takes effort to even think of one such example in PostgreSQL shows the stark contrast between the projects.

Posted on Dec 1, 2008 at 18:44 by Jeff Davis.

While that's a good point, I don't think that can be called a critical bug (crash/wrong result). It's certainly an annoyance factor though. So I think even that one is a bit of a stretch.

Posted on Dec 1, 2008 at 18:48 by Magnus.

I consider that to be a "wrong result" because it can tell you something is a constraint violation where it should not -- and it's non-deterministic, meaning that you might miss it in testing and hit it in production.

However, it absolutely is a stretch. I have never hit this in a production application personally. It also doesn't cause data corruption, and has reasonable workarounds.

Posted on Dec 1, 2008 at 18:53 by Jeff Davis.

yeah while annoying this is documented and mentioned on the todo(http://wiki.postgresql.org/wiki/Todo#Integrity_Constraints) as well. the main problem is that nobody yet found a way to implement deferred unique constraints in a way that would not kill performance.

Posted on Dec 1, 2008 at 22:05 by Stefan Kaltenbrunner.

One example of a long known bug (found during 8.0 development) which crashed the server that was only fixed in 8.3 was:

http://archives.postgresql.org/pgsql-bugs/2004-06/msg00154.php

Posted on Dec 2, 2008 at 00:35 by Kris Jurka.

Another thing that is worth noting is that a pretty large percentage of back-branch commits involve fixing the same bug across more than one branch. It would be really interesting to filter out the "duplicate" commits and blame only the oldest branch (where such a bug presumably originated) for such a group of commits. I think that that would give a better picture of the branch-by-branch stability. Whether it's worth the trouble is another question though.

Posted on Dec 2, 2008 at 02:13 by Tom Lane.

Yeah. I think the X-axis here is not really well chosen. Could we see the same data as four separate time series (8.3.x above 8.2.x above 8.1.x etc) with a shared X-axis of actual calendar date? That would make it more obvious when there are correlated changes in several branches, such as the big encoding fixes that swelled the numbers for both 8.0.8 and 8.1.4. In a display like that you could furthermore break down the commits by actual date rather than aggregating them into the point releases.

Posted on Dec 2, 2008 at 02:32 by Tom Lane.

A comment on the "hidden bug" business: I don't know the exact story behind mysql's bug 37936 of course, but I do know that a lot of bugs in Red Hat's bugzilla are marked private because they contain some amount of customer-confidential information. In many cases there would be no harm in making such reports public, but no one has the time to run around and get them cleared for publicizing. 37936 very possibly has a similar problem. There are enough crashing bugs in their bugzilla that aren't hidden to make me doubt that they hid this just because it was a crasher.

Posted on Dec 2, 2008 at 02:45 by Tom Lane.

Can you calculate the defect containment effectiveness of a MySQL v. Postgrest release?

Example: Bugs before RC1 = 100 Bugs after RC1 = 5

DCE = before / (before+after) * 100%25 = 100 / (100+5) * 100%25 = 95.24%25

Posted on Dec 2, 2008 at 05:00 by dano.

Interesting post, Magnus. I would be curious what the overall changes are between major releases. One hypothesis is that PostgreSQL stability is due to conservatism verging paranoia about new features. Paranoia is good if you want things that work.

Posted on Dec 2, 2008 at 05:32 by Robert Hodges.

Incidentally, I don't think the MySQL "problem" is exactly quality per se, as there's no evidence that MySQL is buggier than many commercial DBMS products earlier in their product cycles. It's rather that MySQL took so long to get 5.1 out the door at the quality level they eventually achieved. Similarly, the MySQL engineering group is often very slow to address obvious bugs that the PostgreSQL community either avoided up front or would have addressed quickly once brought to light. This slowness is one of the key reasons for the fragmentation of the MySQL codeline that has occurred over the last year. I imagine the root cause lies in the way MySQL releases are managed, as code can generally be fixed if you have the will to do so.

Posted on Dec 2, 2008 at 05:38 by Robert Hodges.

It's really strange that the PostgreSQL devs do not believe in maintaining a bug database! Relying on someone to keep track of a bug by essentially wading through an archive of forum posts or e-mails seems to be pretty much useless.

Posted on Dec 2, 2008 at 05:38 by Donny.

Ha, there had to be at least one 'eh. Thanks for pointing it out.

Posted on Dec 2, 2008 at 08:39 by Magnus.

Good point. I missed the perspective of mixing the paid customers with the community stuff in the same tracker, but I can see how that could be a good explanation for this happening.

Posted on Dec 2, 2008 at 08:40 by Magnus.

Interesting, but I'd like to point out that bug 37937 includes the following:

[7 Nov 11:12] Georgi Kodinov

Bug #37936 is probably a duplicate of this bug

[7 Nov 11:46] Georgi Kodinov

Duplicate of bug #37936.

Maybe not as much dirt here as is being made out to be? This took me a whole 1 minute to find. I should probably mention that I've had more luck getting MySQL installations to do replication & 5 nines than I have with Postgres -- but I'm still a supporter of both. I feel both projects are useful, it all just goes back to using the right tool for the job.

Posted on Dec 2, 2008 at 16:57 by Philip Snyder.

I think Magnus has illustrated that a bug tracking system (like MySQL has) doesn't make everything magically better.

Posted on Dec 3, 2008 at 21:43 by Jeff Davis.

I am the packager of postgresql in Mandriva, and from my own experience, the bug tracker is just a big TODO list with a lot of opened case you're not always able to solve.

I don't know if MySQL has tests suite, but PostgreSQL has. And again, being also a developers, a test suite increase more the quality than a bugs tracker everyone can ignore, especially since the tests are added for each new bugs discovered, denying to it to back into the software.

Posted on Dec 4, 2008 at 09:30 by Olivier Thauvin.

Hi! I have access to BUG#37936 and it doesn't seem to have anything more than what is already there in BUG#37937. I'll try to make it public.

Posted on Dec 5, 2008 at 01:03 by Sergey Petrunia.

Yes, MySQL has a testsuite, it's quite big and is a part of the server source. It is also automatically re-run every time somebody has pushed some change into any (5.0/5.1/6.0) tree.

Posted on Dec 5, 2008 at 01:09 by Sergey Petrunia.

Excellent Analysis. Another thing I've noticed is that each 'major' release of PostgreSQL seems to have fewer and fewer bugfix releases, even accounting for time. This suggests the development process is improving...

Posted on Dec 10, 2008 at 02:43 by merlin.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

Pycon Sweden
May 9-10, 2016
Stockholm, Sweden
PGCon
May 17-21, 2016
Ottawa, Canada
Stockholm PUG 2016/3
Jun 16, 2016
Stockholm, Sweden
PGDay UK 2016
Jul 05, 2016
London, UK
PG Day'16 Russia
Jul 6-8, 2016
St Petersburg, Russia
Stockholm PUG 2016/4
Aug 31, 2016
Stockholm, Sweden
Postgres Open
Sep 13-16, 2016
Dallas, USA
PGConf.EU 2016
Nov 1-Jan 4, 2016
Tallinn, Estonia

Past

Stockholm PUG 2016/2
Apr 26, 2016
Stockholm, Sweden
PGConf.US
Apr 18-20, 2016
New York, USA
pgDay Paris
Mar 31, 2016
Paris, France
Nordic PGDay
Mar 17, 2016
Helsinki, Finland
ConFoo
Feb 24-26, 2016
Montreal, Canada
More past conferences