Something I've had to do fairly often in SQL is finding the difference between two sets. A typical case which came up again this week is when you have a set of snapshots in a history table, and want to find which keys were added and removed between the two occassions. One way to do this is with a set of queries - one for what was added, one for what was removed, and possibly one for what wasn't changed (if you need that data). You can also do this with OUTER JOINs, which can often be a lot more efficient given that you can answer all your questions in one query.
So, to illustrate this, here's a simple, sample setup. It's basically a table that contains snapshots of a set of personnummer (swedish social security numbers, but it can of course be anything you want. If it's a single unique field, that makes life a lot easier). Snapshots are generated at a certain date by a scheduled taks that inserts the result of a select on a different set of tables. For this illustration, I'll just insert a couple of bogus rows of data.
CREATE TABLE snapshots (
snapdate timestamptz NOT NULL,
personnr char(13) NOT NULL,
CONSTRAINT pk_snapshots PRIMARY KEY (snapdate,personnr)
);
INSERT INTO snapshots VALUES ('2006-12-01 00:00:00','11111111-1111');
INSERT INTO snapshots VALUES ('2006-12-01 00:00:00','22222222-2222');
INSERT INTO snapshots VALUES ('2006-12-01 00:00:00','33333333-3333');
INSERT INTO snapshots VALUES ('2006-12-01 00:00:00','44444444-4444');
INSERT INTO snapshots VALUES ('2006-12-02 00:00:00','22222222-2222');
INSERT INTO snapshots VALUES ('2006-12-02 00:00:00','33333333-3333');
INSERT INTO snapshots VALUES ('2006-12-02 00:00:00','55555555-5555');
INSERT INTO snapshots VALUES ('2006-12-02 00:00:00','66666666-6666');
Now, to simply get whatever rows were either added or removed between these two sets, use the following query:
<code style="white-space:pre">
SELECT
COALESCE(s1.personnr,s2.personnr) AS personnr,
CASE WHEN s1.personnr IS NULL THEN 'Added'
ELSE 'Removed' END AS action
FROM
(SELECT personnr FROM snapshots WHERE snapdate='2006-12-01 00:00:00')
AS s1
FULL OUTER JOIN
(SELECT personnr FROM snapshots WHERE snapdate='2006-12-02 00:00:00')
AS s2
ON s1.personnr=s2.personnr
WHERE s1.personnr IS NULL OR s2.personnr IS NULL
If you want to see both the changed and the unchanged rows, remove the WHERE statement and expand the CASE like this:
SELECT
COALESCE(s1.personnr,s2.personnr) AS personnr,
CASE WHEN s1.personnr IS NULL THEN 'Added'
WHEN s2.personnr IS NULL THEN 'Removed'
ELSE 'Unchanged' END AS action
FROM
(SELECT personnr FROM snapshots WHERE snapdate='2006-12-01 00:00:00')
AS s1
FULL OUTER JOIN
(SELECT personnr FROM snapshots WHERE snapdate='2006-12-02 00:00:00')
AS s2
ON s1.personnr=s2.personnr ```
If you want only changes in one direction, you should be using LEFT or RIGHT JOIN instead, and if you want just the ones not changed, INNER JOIN. You can still do it with OUTER JOIN, but it will likely be much better performing using the proper JOIN for the case.
New comments can no longer be posted on this entry.