PostgreSQL on Windows has suffered from running out of "Desktop Heap" since day one. Things got worse with 8.3beta1, but Dave got that fixed for beta2. However, someone on the list actually managed to run into a different memory scalability problem - running out of address space in the postmaster.
So how could this happen? Because there were too many threads running. Hold on, you think, PostgreSQL isn't threaded?! Which is correct - PostgreSQL on Windows uses the same process based model as it does on Unix. The reason for this is to make sure that the database works the same way on all platforms, which cuts down platform-specific bugs and makes maintaining easier. However, a couple of win32 specific functions are implemented as separate threads. In the postmaster, for example, we start one thread for each child process spawned. The only purpose of this thread is to wait until the child process dies, and then fire off a SIGCHLD signal - since Windows doesn't provide this functionality built in like Unix does.
So with a lot of backends, this leaves us with a whole lot of threads running in the postmaster that doesn't really do anything. Since they don't do anything, this method was chosen originally for 8.0 with the motivation that the code is trivial and that the overhead of starting a backend process is so big that the starting of a secondary thread won't be noticed at all. Which is correct in all aspects except for one - the thread will require 4Mb of address space for it's stack. Most of this is never put into real memory (only the required initial 64Kb are), but it counts against the address space and the commit charge. This gives two possible resources that can run out - room in the pagefile (for the commit charge) and the address space in the process (max 2Gb memory can be addressed by a 32-bit windows process by default). And with 4Mb / thread, you get up there fairly quickly - something nobody thought about at the time.
So for 8.3beta2, I have rewritten the handling of SIGCHLD from scratch. Instead of creating a thread for each backend, we now use an operating system thread pool to do the waiting, and then queue up the child process exists on an I/O completion port. The second part of switching to using an I/O completion port actually simplified and reduced the amount of code necessary pretty substantially, which is always nice.
So did this help? Some pgbench figures from Dave shows:
Without patch: VM: 1,322,792K Idle threads: 6 Peak threads: 306 With patch: VM: 98,088K Idle threads: 3 Peak threads: 7
So yes, it makes a huge difference. Yay! Apart from fixing the address space issue, this should also decrease the pagefile requirements for any environments with a lot of backends. You're still going to see significantly worse performance than on Unix with lots of backends - shared memory access is still slow, etc. But it does represent a fairly big step forward.
Even the fact that mingw is once again missing the required functions thus causing breakage on the buildfarm makes that difference smaller. And a patch is in the works to fix that...
Thanks to Trevor and Dave for helping me track this down and test the resulting patch.