Finding gaps in partitioned sequences

There are an almost unlimited number of articles on the web about how to find gaps in sequences in SQL. And it doesn't have to be very hard. Doing it in a "partitioned sequence" makes it a bit harder, but still not very hard. But when I turned to a window aggregate to do that, I was immediately told "hey, that's a good example of a window aggregate to solve your daily chores, you should blog about that". So here we go - yet another example of finding a gap in a sequence using SQL.

I have a database that is very simply structured - it's got a primary key made out of (groupid, year, month, seq), all integers. On top of that it has a couple of largish text fields and an fti field for full text search. (Initiated people will know right away which database this is). The sequence in the seq column resets to zero for each combination of (groupid, year, month). And I wanted to find out where there were gaps in it, and how big they were, to debug the tool that wrote the data into the database. This is really easy with a window aggregate:

      seq-lag(seq,1) OVER (PARTITION BY groupid, year, month ORDER BY seq) AS gap FROM mytable
) AS t
ORDER BY groupid, year, month, seq

One advantage to using a window aggregate for this is that we actually get the whole row back, and not just the primary key - so it's easy enough to include all the data you need to figure something out.

What about performance? I don't really have a big database to test this on, so I can't say for sure. It's going to be a sequential scan, since I look at the whole table,and not just parts of it. It takes about 4 seconds to run over a table of about a million rows, 2.7Gb, on a modest VM with no actual I/O capacity to speak of and a very limited amount of memory, returning about 100 rows. It's certainly by far fast enough for me in this case.

And as a bonus, it found me two bugs in the loading script and at least one bug in somebody elses code that I'm now waiting on to get fixed...


what is seq-lag ? i assume this is a user defined func.

Posted on Jan 30, 2012 at 17:23 by mark.

No, there are no user defined functions in this.

seq is the column name, then minus the operator, then lag the builtin window function. So a standard integer minus, while calling the window function lag(), which is a standard one.

Posted on Jan 30, 2012 at 17:25 by Magnus Hagander.

thanks, sorry I should have noticed that was the seq MINUS the lag function.

p.s. you example may have a typo in it.

groupid != gropid, line 3

Posted on Jan 31, 2012 at 19:01 by mark.

Hah, thanks for pointing that out. That's what I get for trying to clean up the formatting when posting it...

Posted on Feb 6, 2012 at 10:18 by Magnus Hagander.


I speak at and organize conferences around Open Source in general and PostgreSQL in particular.


Stockholm PUG 2016/5
Oct 25, 2016
Stockholm, Sweden
PGConf.EU 2016
Nov 1-4, 2016
Tallinn, Estonia
Berlin PUG
Nov 17, 2016
Berlin, Germany
PGConf.Asia 2016
Dec 2-3, 2016
Tokyo, Japan
FOSDEM + PGDay 2017
Feb 2-4, 2017
Brussels, Belgium


Postgres Vision 2016
Oct 11-13, 2016
San Francisco, USA
Postgres Open
Sep 13-16, 2016
Dallas, USA
Stockholm PUG 2016/4
Aug 31, 2016
Stockholm, Sweden
PG Day'16 Russia
Jul 6-8, 2016
St Petersburg, Russia
PGDay UK 2016
Jul 05, 2016
London, UK
More past conferences