Revamping the postgresql.org web search, part 3

Just a couple of notes during the further progress I've made:

  • While OOP in PHP is certainly pretty far from polished, it is a lot nicer than in Perl. As for the actual implementation details, I think they pretty much even out in the end - but the fact that I couldn't get encoding to work at all in Perl was the killer for it. So the PHP implementation will be the one that's used.

  • Being able to use persistent connections when connecting to websites to download their content for indexing would give noticeable speedups, specifically over slow connections. But doing that requires implementing HTTP/1.1, which in turn requires implementing chunked encoding. Something for the future - I can still full-index all the sites we pull down except the archives (a little over 200 sites) in less than 10 minutes. And archives index pretty fast anyway, since Josh has kindly set me up with a box that lives on the same network as the archives server.

  • Many sites don't implement If-Modified-Since properly. Luckily I've been able to bug a couple of the site-owners into fixing it, given that they are pg sites and I have "fairly good connections" with some of the webmasters there. Common problems include not implementing it at all, or just comparing exact values instead of ranges (this second one is actually mentioned in a lot of places as a caveat for implementations, but hey, I want to us eit..)

  • I need to tune my tsearch2 dictionaries. Just running on the standard one now, can probably be a lot more efficient using ispell and/or snowball. And tune some stopwords.

  • While not done yet, this is progressing nicely and I should be able to move to proper testing fairly soon. Yay.

(Updated: removed note about headline() because obviously I can't read my own testresults properly!)


Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

Pycon Sweden
May 9-10, 2016
Stockholm, Sweden
PGCon
May 17-21, 2016
Ottawa, Canada
Stockholm PUG 2016/3
Jun 16, 2016
Stockholm, Sweden
PG Day'16 Russia
Jul 6-8, 2016
St Petersburg, Russia
Stockholm PUG 2016/4
Aug 31, 2016
Stockholm, Sweden
Postgres Open
Sep 13-16, 2016
Dallas, USA
PGConf.EU 2016
Nov 1-Jan 4, 2016
Tallinn, Estonia

Past

Stockholm PUG 2016/2
Apr 26, 2016
Stockholm, Sweden
PGConf.US
Apr 18-20, 2016
New York, USA
pgDay Paris
Mar 31, 2016
Paris, France
Nordic PGDay
Mar 17, 2016
Helsinki, Finland
ConFoo
Feb 24-26, 2016
Montreal, Canada
More past conferences