Building PostgreSQL HA Systems with repmgr

Introduction

Welcome to our latest blog post! Today, we’ll delve into the intricacies of setting up PostgreSQL High availability (HA) production cluster for a prominent client in the banking sector. Our task? Designing and implementing a robust HA solution for their PODS, with PostgreSQL as the database backbone. Join us as we navigate through the process of establishing a four-node repmgr cluster, ensuring continuous availability and data integrity for our esteemed client.

Why repmgr

repmgr is a popular tool used for managing PostgreSQL replication and failover in high availability (HA) environments. It simplifies the setup and management of PostgreSQL replication by providing easy-to-use commands and tools for configuration, monitoring, and failover. One of its key features is its ability to automatically detect primary node failures and promote a standby node to become the new primary.

Prerequisites

Provision four EC2 instances.
Install PostgreSQL 14 on each node (why not the latest – Application compatibility)
Install repmgr on each node

Setup

Splitting the four nodes into 1 primary node, 2 standby nodes & 1 witness node.

Before setting up the repmgr , we need to add the IPv4 address of four nodes to a single security group in EC2 instances.

In Primary node:

Create the user and database accordingly and also do entries of remaining nodes in pg_hba.conf of primary server.

#Changes of parameters in Postgresql.conf

max_wal_senders = 10
max_replication_slots = 10
wal_level ='logical' #to enable logical decoding in future
hot_standby = on
archive_mode = on
archive_command = '/bin/true'
shared_preload_libraries = 'repmgr'

#Restart the services after changing the parameters in postgresql.conf file.

Then, we need to create a repmgr configuration file with the following parameters:

cluster='clientname_prod'
node_id=1
node_name=node1
conninfo='host=node1 user=myuser dbname=mydb connect_timeout=2'
data_directory='/var/lib/pgsql/14/data/'
failover=automatic
promote_command='/usr/pgsql-14/bin/repmgr standby promote -f /var/lib/pgsql/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-14/bin/repmgr standby follow -f /var/lib/pgsql/repmgr.conf --log-to-file --upstream-node-id=%n'

Now, we need to register the Primary using the command

/usr/pgsql-14/bin/repmgr -f /var/lib/pgsql/repmgr.conf primary register

#To check whether the primary server is registered or not,
/usr/pgsql-14/bin/repmgr -f /var/lib/pgsql/repmgr.conf  cluster show

Once the primary is registered, we can setup the standby node

In Standby node

Two steps:

One is standby clone and the other is standby register.
Before that, let us create the repmgr configuration file in the standby server.

node_id=2
node_name=node2
conninfo='host=(Standby IP address) user=myuser dbname=mydb connect_timeout=2'
data_directory='/var/lib/pgsql/14/data'
failover=automatic
promote_command='/usr/pgsql-14/bin/repmgr standby promote -f /var/lib/pgsql/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-14/bin/repmgr standby follow -f /var/lib/pgsql/repmgr.conf --log-to-file --upstream-node-id=%n'

Now, let us do a dry run before cloning using the command,

/usr/pgsql-14/bin/repmgr -h (Primary IP address) -U myuser -d mydb -f /var/lib/pgsql/repmgr.conf standby clone --dry-run

If it is successful, let us do the standby clone using the command

/usr/pgsql-14/bin/repmgr -h (Primary IP address) -U myuser -d mydb -f /var/lib/pgsql/repmgr.conf standby clone

Once the cloning is done ,Register the standby now using the command

/usr/pgsql-14/bin/repmgr -f /var/lib/pgsql/repmgr.conf standby register

Make sure that you stop the PostgreSQL server while doing the Standby clone and restart it before registering the standby.

Repeat the same process with the second standby too.

In Witness node

In the witness node, the parameter changes would need to be made in postgresql.conf file, and entries in pg_hba.conf need to be added similar to what we did in Primary; and also create the repmgr configuration file accordingly.

To register the witness node, use the command

/usr/pgsql-14/bin/repmgr -f /var/lib/pgsql/repmgr.conf witness register -h (Primary IP)

Now, we have all of the four nodes ready and registered as needed. You could use the cluster show command to check the nodes and verify that everything is in order.

Automate the failover

To automate the failover, there is a process called repmgrd daemon. It is a background process that runs continuously to manage PostgreSQL replication and failover using repmgr. It is a crucial component of the repmgr toolset, providing automated monitoring, management, and failover capabilities for PostgreSQL replication clusters.

To activate repmgrd, use the command

/usr/pgsql-14/bin/repmgrd -f /var/lib/pgsql/repmgr.conf

#To check the events, use the command:
/usr/pgsql-14/bin/repmgr -f /var/lib/pgsql/repmgr.conf cluster event

Conclusion

In conclusion, the successful implementation of the four-node repmgr cluster enabled our client to achieve HA for their Microservices architecture, bolstering data resilience and ensuring uninterrupted service delivery in a mission-critical banking environment. With repmgr’s robust features and our expertise, we provided a reliable and scalable solution tailored to the client’s specific requirements. To this setup , we also added PgBouncer and keepalived. If you want to know more about the architecture and how it works with a practical approach, please stay tuned for this series..