Step-by-Step Guide to PostgreSQL HA with Patroni: Part I

High availability (HA) in PostgreSQL refers to the ability of the database system to remain operational and accessible with minimal downtime, even in the event of a failure. In industries like the banking and financial sector, e-commerce , healthcare sector etc ., where uptime is critical for operations, ensuring that your PostgreSQL database remains highly available is essential. Considering banking sector, downtime or database unavailability in banking systems can lead to significant disruptions, such as delayed transactions, loss of data, financial discrepancies, and damage to the business’s reputation. For these sectors, maintaining continuous access to customer accounts, transaction processing, and other critical financial services availability is paramount.

To achieve high availability, PostgreSQL typically employs replication and automatic failover. In the context of banking systems, the goal is to ensure that if one database node fails, another can seamlessly take over without impacting transaction processing or application functionality. This ensures uninterrupted service to customers and provides continuous access to vital financial data. In our previous blogs, we have discussed about one of the HA architecture based on repmgr. In this blog , let’s discuss another HA architecture based on Patroni.

Why Patroni

Patroni is widely used for high availability (HA) in PostgreSQL clusters due to its ability to provide automated failover, leader election via distributed consensus mechanisms and DCS(Distributed configuration store). It simplifies PostgreSQL high availability setups by managing the cluster’s state and ensuring that a PostgreSQL instance remains available even in the event of failures on other nodes. Here are some of the advantages using Patroni as a HA tool:

Automatic failover and recovery
Leader Election via consensus
Flexible in terms choosing consensus
OpenSource tool

Let’s discuss the remaining components in the architecture :

HAproxy A high-performance load balancer used to route traffic between the primary and standby PostgreSQL instances. It ensures that database connections are always directed to the current primary, providing failover capabilities by switching connections to a new primary in the event of a failure.

PgBouncer A lightweight connection pooler for PostgreSQL that improves the performance of database connections by pooling and reusing database connections. It is particularly useful in environments with high connection churn and can be used with Patroni to optimize database connection management.

etcd A distributed key-value store used by Patroni for managing cluster state and leader election. It helps the nodes to communicate and decide which PostgreSQL instance is the leader (primary) and which ones are standbys. For this purpose, there are a lot of options besides etcd too, like consul, zookeeper etc. Here we’ll focus the discussion exclusively on using etcd with Patroni.

How these components work together

We are considering 5 server in this architecture, 3 in one region and 2 in other region. In the first region , I will be deploying 2 PostgreSQL nodes along with patroni, etcd and pgbouncer and also one node with haproxy . In the second region, we will have one node for PostgreSQL similar to what we have in the first region and one for haproxy.

As Patroni installed on each PostgreSQL node in the cluster, both the management of the PostgreSQL database and the updating of the distributed consensus system are within the bot’s (i.e., etcd‘s) capabilities. Etcd needs to be set up in a high availability (HA) mode that enables its individual nodes to come to a consensus regarding the status of the cluster.

With the etcd installed in the three PostgreSQL nodes, the primary database (i.e., the leader) is determined to be the PostgreSQL instance that sets the Etcd key first. Etcd utilizes a Raft-based consensus method to guard against the occurrence of race situations. Following the receipt of confirmation that it is in possession of the key, a bot will configure the PostgreSQL instance to function as the primary database. The election of a primary will be visible to all other nodes, at which point their bots will configure their PostgreSQL instances to function as replicas.

HAProxy monitors changes in the master/slave nodes and connects to the appropriate master node when clients request a connection. It determines which node is the master by calling the Patroni REST API. The Patroni REST API is configured to run on port 8008 in each database node. And also it balances the load , shifting writes to the primary and reads to the standby’s.

pgbouncer helps manage and optimize the connections between the haproxy and the PostgreSQL database, especially in high-traffic environment.

This architecture is designed to provide a robust, self-managing database cluster where high availability is a paramount concern. By employing Patroni, etcd, and HAProxy in unison, the setup achieves automatic failover, efficient load balancing, and consistent replication, thereby delivering a resilient and high-performing database environment.

Failover Times: Failover times may not always be instantaneous, depending on the cluster’s state and the reasons for failover. There may be a short period of unavailability while a new leader is elected and the cluster is reconfigured.

Installation

They are different types to install , there are yum install and rpm based installation. Here , I used the rpm based installation as the PostgreSQL nodes were created in an air-gapped environment, and did not have direct access to the internet. The first go-to repo is pgdg ( I’m assuming PostgreSQL is already installed, and not covering the installation of PostgreSQL here):

HAProxy

Here are the installation steps. I’m attaching the link to the rpm along with requisite dependencies.

#To know the dependencies of haproxy
yum deplist haproxy
#To download the rpm 
wget https://download.postgresql.org/pub/repos/yum/common/pgdg-rhel8-extras/redhat/rhel-8-x86_64/haproxy-3.1.1-1PGDG.rhel8.x86_64.rpm
#To install the downloaded rpm
rpm -ivh haproxy-3.1.1-1PGDG.rhel8.x86_64.rpm
#To verify the installation
haproxy --version

Patroni with etcd

Here I’m giving the installation of patroni along with etcd

#To know the dependencies of patroni
yum deplist patroni
yum deplist etcd
#To download the rpm for patroni and etcd 
wget https://download.postgresql.org/pub/repos/yum/common/redhat/rhel-8-x86_64/patroni-4.0.4-1PGDG.rhel8.noarch.rpm
wget https://download.postgresql.org/pub/repos/yum/common/pgdg-rhel8-extras/redhat/rhel-8-x86_64/etcd-3.5.9-1.rhel8.x86_64.rpm
# To install the downloaded rpms
rpm -ivh patroni-4.0.4-1PGDG.rhel8.noarch.rpm
rpm -ivh etcd-3.5.9-1.rhel8.x86_64.rpm
#To verify the installation
patroni --version
etcd --version

PgBouncer

#To know the dependencies of pgbouncer
yum deplist pgbouncer
#To download the rpm for pgbouncer 
wget https://download.postgresql.org/pub/repos/yum/common/redhat/rhel-8.9-x86_64/pgbouncer-1.24.0-42PGDG.rhel8.x86_64.rpm
# To install the downloaded rpms
rpm -ivh pgbouncer-1.24.0-42PGDG.rhel8.x86_64.rpm
#To verify the installation
pgbouncer --version

Stay tuned for for the configuration part and the testing the cluster resilience in the next blog.

Summary

To summarize this blog presented a robust High Availability (HA) architecture for PostgreSQL leveraging Patroni, etcd, HAProxy, and PgBouncer to ensure minimal downtime and high resilience. We discussed how each component plays a crucial role in achieving HA: Patroni handles automatic failover, leader election, and replication management; etcd serves as a distributed key-value store enabling consensus and state management; HAProxy provides load balancing and ensures traffic is directed to the correct node; and PgBouncer optimizes connection pooling to improve performance in high-traffic environments. This architecture is especially critical in sectors like banking, healthcare, and e-commerce, where continuous database availability is paramount. By combining these components, we ensure automatic recovery from failures, efficient database connection management, and high scalability. The installation process is simple, and by implementing this solution, organizations can maintain reliable, fault-tolerant systems. In the upcoming blog, we will delve deeper into the configuration and testing of this HA setup to ensure everything functions seamlessly.