PostgreSQL Replication and High Availability

PostgreSQL Streaming Replication

* Update `.pgpass` file on the Target Host *

An important aspect of an enterprise database engineering is replication to ensure maximum availability. PostgreSQL provides (as of this writing) four options for replication. Each offers it's own features, benefits and issues. This is only an overview, hopefully leading the reader to the more relevant packages for their needs.

PostgreSQL Replication Packages

PGCluster - Multi-master no delay synchronus replication for load sharing or HA. Large objects are now supported.
Slony-I - Master to multi-slave cascading and almost-failover.
DBBalancer - Alpha
pgpool - Connection pooling front end with synchronous replication
PostgreSQL table comparator - rsync for PostgreSQL.

PGCluster

This package is in rapid development and looks very promising, needs PostgreSQL 8 or better. It has the features that are needed in real HA environments, think financial, legal industries. Multi-Master allows for load-balancing and fail over type high availabilty. PGCluster 1.0 started with this, 1.1 added large-object replication and 1.3 is an nice refinement. The documentation for this package is well done and easy to understand.

Slony-I

The Slony-I (one) package is well developed, documented and works with PostgreSQL 7.3 and newer. Slony does not do multi-master replication like PGCluster and does not support large objects. Slony is also suitable for use in envrionments that require high-availability.

pgpool

Pgpool is a connection pooling server that supports up two PostgreSQL servers in synchronous replication. This replication mode operates in degeneration mode when one server has failed, the data must be manually synchronized before restarting. The load-balancing features are restricted to non-transactional SELECT queries. The documentation for this package is well done and easy to understand.

pg_comparator

Pg_comparator is effectively rsync for PostgreSQL. The reader is expected to understand what rsync is and how the concept would operate with database tables. Only one table can be replicated at a time. This is a nice package for moving development code into test or test to live and the like. Recovering from a Slony failure this will come in handy.

Comparisons

PGCluster and Slony are the two most advanced packages mentioned here. PGCluster's multi-master cabability and large object support make it shine and the rapid development bodes well for the future. PGCluster also supports rapid recovery from failure when combingin it's load-balancer with the replicator. Slony is designed more for data centers to keep hot-backups of the data servers. Slony does not support hot-failover but does support planned switch over.

Pg_pool is a simpler concept than PGCluster and Slony and understandably less featured. Designed for pooling not so much for HA and load-balancing as PGClusetr or Slony. If one is feeling adventurous pg_pool can be configured in synchronous replication and connect to Slony masters.

Pg_comparator is not replication, best used for synchronizing two tables. This type of tool could be used when bringing a failed Slony master back-online.

Bottom Line

Seems like PGCluster is the package to use for those that really need that. For almost real-time backups one could use Slony, remember it's failover features aren't as complete as PGCluster. A mostly static datbase, say a read-only copy for web content could benefit from pgpool by using non-transactional selects. Pg_comparator could be used for this read-only datbase server pool to push changes into the live server.

ChangeLog

12 Jan 2005 - Created /djb