Another step towards easier backups

Today I committed the first version of a new PostgreSQL tool, pg_basebackup. The backend support was committed a couple of weeks back, but this is the first actual frontend.

The goal of this tool is to make base backups easier to create, because they are unnecessarily complex in a lot of cases. Base backups are also used as the foundation for setting up streaming replication slaves in PostgreSQL, so the tool will be quite useful there as well. The most common way of taking a base backup today is something like (don't run this straight off, it's not tested, there are likely typos):

psql -U postgres -c "SELECT pg_start_backup('base backup')"
if [ "$?" != "0" ]; then
   echo Broken
   exit 1
fi
tar cfz /some/where/base.tar.gz /var/lib/pgsql/data --exclude "*pg_xlog*"
if [ "$?" != "0" ]; then
   echo Broken
   psql -U postgres -c "SELECT pg_stop_backup()"
   exit 1
fi
psql -U postgres -c "SELECT pg_stop_backup()"
if [ "$?" != "0" ]; then
   echo Broken
   exit 1
fi

And when you're setting up a replication slave, it might look something like this:

psql -U postgres -h masterserver -c "SELECT pg_start_backup('replication base', 't')"
if [ "$?" != "0" ]; then
   echo Broken
   exit 1
fi
rsync -avz --delete --progress postgres@masterserver:/var/lib/pgsql/data /var/lib/pgsql
if [ "$?" != "0" ]; then
   echo Broken
   psql -U postgres -c "SELECT pg_stop_backup()"
   exit 1
fi
psql -U postgres -c "SELECT pg_stop_backup()"
if [ "$?" != "0" ]; then
   echo Broken
   exit 1
fi

There are obvious variations - for example, I come across a lot of cases where people don't bother checking exit codes. Particularly for the backups, this is really dangerous.

Now, with the new tool, both these cases become a lot simpler:

pg_basebackup -U postgres -D /some/where -Ft -Z9

That simple. -Ft makes the system write the output as a tarfile (actually, multiple tar files if you have multiple tablespaces, something the "old style" examples up top don't take into account). -Z enables gzip compression. The rest should be obvious...

In the second example - replication - you don't want a tarfile, and you don't want it on the same machine. Again, both are easily handled:

pg_basebackup -U postgres -h masterserver -D /var/lib/pgsql/data

That's it. You can also add -P to get a progress report (which you can normally not get out of tar or rsync, except on an individual file basis), and a host of other options.

This is not going to be a tool that suits everybody. The current method is complex, but it is also fantastically flexible, letting you set things up in very environment specific ways. That is why we are absolutely not removing any of the old ways, this is just an additional way to do it.

If you grab a current snapshot, you will have tool available in the bin directory, and it will of course also be included in the next alpha version of 9.1. Testing and feedback is much appreciated!

There are obviously things left to do to make this even better. A few of the things being worked on are: Ability to run multiple parallel base backups. Currently, only one is allowed, but this is mainly a restriction based on the old method. Heikki Linnakangas has already written a patch that does this, that's just pending some more review. Ability to include all the required xlog files in the dump, in order to create a complete "full backup". Currently, you still need to set up log archiving for full Point In Time Recovery, even if you don't really need it. We hope to get rid of this requirement before 9.1. Another option is to stream the required transaction logs during the backup, not needing to include them in the archive at all. This is less likely to hit until 9.2. The ability to switch WAL level as necessary. For PITR or replication to work, wal_level must be set to archive or hot_standby, and changing this requires a restart of the server. The hope is to eventually be able to bump this from the default (minimal) at the start of the backup, and turn it back down when the backup is done. This is definitely not on the radar until 9.2 though.


Comments

Since the role being used for the base backup requires replication permissions (due to replication features being utilised), it might be worth mentioning that too. To facilitate testing, a set of instructions needed to get to the necessary state would be useful too. But nice work :)

Posted on Jan 23, 2011 at 13:45 by Thom Brown.

Really cool, thx a lot for this. It makes SR / HS easier to use.

Posted on Jan 23, 2011 at 14:07 by akretschmer.

Cool. Regarding "full backup" functionality, have you spoken with the pg_rman guys, so that you may colaborate together? http://code.google.com/p/pg-rman/

Posted on Jan 23, 2011 at 14:30 by RDL.

There's actually a lot of re-inventions of this wheel, often with slightly different needs. I know we've been following Magnus' work in 9.1 wondering how/if we might change omnipitr to work with the new functionality. A lot of it is simpler, but he isn't solving the complicated problems like making backups for slave servers, so we're not out of the tool maintenance business yet. ;-)

Posted on Jan 31, 2011 at 01:30 by Robert Treat.

Sure, some of the functionality already existed in third-party tools - and some still lives on in the third party tools. There are several around. I looked towards omnipitr once, but I ran away when I tried to figure out what it was doing - that wasn't an easy read ;) But AFAIK, it still doesn't support doing things over libpq, does it? Which is the main reason for this patch in the first place...

Posted on Jan 31, 2011 at 07:44 by Magnus.