PostgreSQL Connection Failover and Load Balancing with libpq

Introduction:

As database administrators, our primary goal is to ensure data redundancy. One popular method to achieve this is replication failover. Regardless of the number of primary and standby nodes within a data centre (DC) and disaster recovery (DR) setup, if the database is not accessible to the application, the failover mechanism is not useful for the business.

Understanding `libpq`:

Libpq is a client library written in C that facilitates communication between applications and PostgreSQL databases. It is widely used in developing software that interacts with PostgreSQL. Popular tools like pgbench and psql utilize libpq for communication with PostgreSQL.

Features of `libpq`:

PostgreSQL offers a unique client access library called libpq, which provides a failover mechanism for client applications with multiple host connections simultaneously. It includes attribute options like read-write and any to handle both primary and standby servers. In PostgreSQL 16, load balancing was introduced directly from the application connection.

Use cases with `libpq`:

When the primary node is alive, the application connects to the primary. When it is down, the replication configuration automatically upgrades one of the standbys to primary. Here, the connection string and/or IP address could be different – unless you’re using a HAProxy or a VIP. In this scenario, the libpq library will automatically detect which database node is primary and connect to that host from the given pool of hosts.

Connection Mechanism:

It’s important to note that the driver may take extra time to connect to each node in the list to determine if it is the master. For instance, if the server at 192.168.226.128 is no longer the master, and 192.168.226.129 (the second server in the connection string) has become the new master accepting writes, the driver will first check if the initial server allows writes. If the first server is unreachable, the driver will then attempt to connect to the second one. This may introduce additional delay, but the failover remains seamless, ensuring that the application does not need to be interrupted during the switchover.

Practical Example:

To better understand this, let’s look at a simple scenario. I have set up a three-node replication cluster:

192.168.226.128 – Master
192.168.226.129 – First Standby
192.168.226.130 – Second Standby

Connecting to the Master Using Read-Write Mode:

Here’s an example of how to achieve this using Python. I’ve written a simple Python script where I set target_session_attrs to “read-write” and provided multiple IPs for the host. When I run the script, it verifies the connected IP (in this case, 192.168.226.128 as the master) and confirms that the server is not in recovery mode.

def init_db_connection():
    conn = psycopg2.connect(
        database="postgres",
        host="192.168.226.129,192.168.226.128,192.168.226.130",
        user="postgres",
        password="secret",
        port="5432",
        target_session_attrs="read-write",
        load_balance_hosts="random"
    )
    return conn

@app.route('/status')
def check_status():
    conn = init_db_connection()
    cur = conn.cursor()
    cur.execute("SELECT pg_is_in_recovery(), inet_server_addr()")
    status = cur.fetchone()
    cur.close()
    conn.close()
    return f"Server in recovery: {status[0]}, IP address: {status[1]}"

The above /status route connects to the primary server. You can see this in the browser:

Server in recovery: False, IP address: 192.168.226.128

If the primary server changes, the application will automatically connect to the new primary IP address.

Connecting to Any Server for Reads:

For select queries, we can leverage the load balancing feature of libpq. With the load_balance_hosts option, the IPs are shuffled every time a connection is established, so a random host will connect to the application server.

def get_db_connection():
    conn = psycopg2.connect(
        database="postgres",
        host="192.168.226.129,192.168.226.128,192.168.226.130",
        user="postgres",
        password="secret",
        port="5432",
        target_session_attrs="any",
        load_balance_hosts="random"
    )
    return conn

@app.route('/')
def index():
    conn = get_db_connection()
    cur = conn.cursor()
    # Execute the query to get recovery status and server IP address
    cur.execute("SELECT pg_is_in_recovery(), inet_server_addr()")
    result = cur.fetchone()  # Fetch the result (a single row)
    # `result` is a tuple, where result[0] is the recovery status, and result[1] is the server IP address
    recovery_status, ip_address = result
    print(f"SELECT: Connected to database on IP: {ip_address}, Recovery status: {recovery_status}")
    # Fetch items data
    cur.execute('SELECT * FROM items')
    items = cur.fetchall()
    cur.close()
    conn.close()
    return render_template('index.html', items=items)

Example output:

SELECT: Connected to database on IP: 192.168.226.128, Recovery status: False
127.0.0.1 - - [25/Sep/2024 07:38:27] "GET / HTTP/1.1" 200 -
SELECT: Connected to database on IP: 192.168.226.129, Recovery status: False
127.0.0.1 - - [25/Sep/2024 07:38:27] "GET / HTTP/1.1" 200 -
SELECT: Connected to database on IP: 192.168.226.129, Recovery status: False
127.0.0.1 - - [25/Sep/2024 07:38:35] "GET / HTTP/1.1" 200 -
SELECT: Connected to database on IP: 192.168.226.129, Recovery status: False
127.0.0.1 - - [25/Sep/2024 07:38:35] "GET / HTTP/1.1" 200 -
SELECT: Connected to database on IP: 192.168.226.129, Recovery status: False
127.0.0.1 - - [25/Sep/2024 07:38:35] "GET / HTTP/1.1" 200 -
SELECT: Connected to database on IP: 192.168.226.131, Recovery status: False

Conclusion:

PostgreSQL’s libpq library offers a robust solution for seamless database failover and load balancing. By leveraging its features, such as multiple host connections, read-write attributes, and load balancing, application developers can ensure high availability and reliability of their connections to PostgreSQL databases. This not only minimizes downtime but also enhances the overall performance and resilience of the application. Implementing libpq in your setup can significantly improve the accessibility and efficiency of your database operations, making it a valuable tool for any organization relying on PostgreSQL.

Introduction:

Understanding libpq:

Features of libpq:

Use cases with libpq: