As I finished loading the approximately 550,000 emails in our archives into my UTF8 encoded database again (bad encoding earlier), I decided to run a quick benchmark of the different ways of doing fulltext search on it with tsearch2, partially because Devrim requested it. I ran the same query a couple of times to bring things into the cache, and then saw very stable performance on all the queries. The query I ran was:
explain analyze select count(*) from messages where fti @@ plainto_tsquery('magnus hagander');
Which should be simple and self-explaining. I got the following search times:
This shows that tsearch2 with GIN is a lot faster than the GiST version we had in 8.1. Updates are more expensive, but when your dataset has a lot more read than write, then GIN will definitly be a big win.