Home > Software engineering >  Multi-Region Availability for Postgres Database to be able to offer our clients 4 9's of avail
Multi-Region Availability for Postgres Database to be able to offer our clients 4 9's of avail

Time:09-19

We use Google Cloud SQL, and when deployed in High Availability mode, it offers an SLA of 99.95%. Google offers its high availability still within the same region, just in a different zone. We have clients who are asking for an SLA of 99.99%, and are willing to pay for 99.999%.

The only thing we've thought of is creating a read replica in another region that we could fail over to if there is such an outage. Such a fail over would be manual though. We would have to 1. Take down our primary database. 2. Promote the read replica 3. Redeploy our servers, changing the environment variable for our database.

With everything being so manual, it seems difficult to be able to offer this as part of our SLA, as such a change would take ~30 minutes. We would need to setup something to be automatic.

Surely this isn't a common problem. There has to be a better way to achieve higher availability for postgres. What do other companies do?

CodePudding user response:

I have been managing PostgreSQL High Availability cluster. We did failover test on VMWare on premise with multi region deployment. You need below steps for failover:

  1. Promote standy server which is there in DR region using pg_promote
  2. Run CHECKPOINT on promoted done, so that other nodes who will join this newly promoted nodes gets upto date information.
  3. Join other cluster nodes with this newly promoted node using pg_rewind for diff syncing only. If you deploy new node, then need to use pg_basebackup to join this new node.

Bonus TIP: If replication slot is supported, then you should create replication slot for every other cluster nodes as soon as you perform step 1.

All above steps need to perform as a sidecar monitoring using some tool

  • Related