The 101s of Postgres Streaming Replication

Introduction:

Embarking on my first customer engagement to set up PostgreSQL Streaming Replication was an insightful journey. As a Database Engineer, this was my first hands-on experience with implementing streaming replication in a real-world environment. The stakes were high, as the customer required a robust and reliable high-availability solution to ensure their critical applications remained operational without interruption.

From the initial planning stages to the final deployment, every step of the process brought new challenges and learning opportunities. The customer’s environment was complex, with multiple databases running on different nodes, each requiring seamless synchronization to maintain data integrity and consistency. My goal was to create a replication setup that not only met their high availability needs but also provided the scalability and performance necessary to support their growing business.

In this blog post, I’ll share my experience setting up PostgreSQL Streaming Replication. Whether you’re a seasoned DBA or new to the world of database replication, I hope my journey will provide valuable insights and practical tips for successfully implementing streaming replication in your own PostgreSQL environment.

The streaming replication of a standby database server is configured to connect to the primary server, which streams WAL (Write-Ahead Logging) records to the standby as they are generated, without waiting for the WAL file to be filled. By default, streaming replication is asynchronous, where data is written to the standby server after a transaction has been committed on the primary server. This means that there is a small delay between committing a transaction in the master server and the changes becoming visible in the standby server.

Step 1 :Configure Primary Server

Open postgresql.conf file on the primary server and make this changes:

listen_addresses = '*'  
wal_level = replica      
max_wal_senders = 5
hot_standby=on

Step 2: Enable Replication User

Open pg_hba.conf file and add an entry to allow replication connections from the standby server:

host    replication    replication_user    standby_ip/32    md5
#Restart PostgreSQL to apply the changes.

Step 3: Create database

postgres=# CREATE DATABASE mydb;CREATE DATABASE #Switch to database : 
\c mydb

Step 4: Create the user

Create a user in master using whichever slave should connect for streaming the WALs. This user must have replication role

mydb=# CREATE ROLE replica WITH REPLICATION ENCRYPTED PASSWORD 'postgres';
CREATE ROLE

Step 5: Taking pg_basebackup

pg_basebackup helps us to stream the data through the wal sender process from the master to a slave to set up replication.

[postgres@master ~]$ pg_basebackup -h 10.206.0.2 \
                                   -p 5433 -D /var/lib/pgsql/14/data/ \
                                   -U replica  -P -v -R -X stream \
                                   -C -S book1 
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created replication slot "book1"
34874/34874 kB (100%), 1/1 tablespace                                         
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed

Step 6: Check for Standby Signal

Notice that standby.signal is created and the connection settings are appended to postgresql.auto.conf.

[postgres@master ~]$ cd /var/lib/pgsql/14/data
# Check for postgresql.conf and standby.signal file in the data directory.

[postgres@master data]$ ls -ltr
-rw-------. 1 postgres postgres    225 May 29 02:41 backup_label
-rw-------. 1 postgres postgres 180563 May 29 02:42 backup_manifest
drwx------. 6 postgres postgres     54 May 29 02:42 base
-rw-------. 1 postgres postgres     30 May 29 02:42 current_logfiles
drwx------. 2 postgres postgres   4096 May 29 02:42 global
drwx------. 2 postgres postgres     32 May 29 02:42 log
drwx------. 2 postgres postgres      6 May 29 02:42 pg_commit_ts
drwx------. 2 postgres postgres      6 May 29 02:42 pg_dynshmem
-rw-------. 1 postgres postgres   4929 May 29 02:42 pg_hba.conf
-rw-------. 1 postgres postgres   1636 May 29 02:42 pg_ident.conf
drwx------. 4 postgres postgres     68 May 29 02:42 pg_logical
drwx------. 4 postgres postgres     36 May 29 02:42 pg_multixact
drwx------. 2 postgres postgres      6 May 29 02:42 pg_notify
drwx------. 2 postgres postgres      6 May 29 02:42 pg_replslot
drwx------. 2 postgres postgres      6 May 29 02:42 pg_serial
drwx------. 2 postgres postgres      6 May 29 02:42 pg_snapshots
drwx------. 2 postgres postgres      6 May 29 02:42 pg_stat
drwx------. 2 postgres postgres      6 May 29 02:42 pg_stat_tmp
drwx------. 2 postgres postgres      6 May 29 02:42 pg_subtrans
drwx------. 2 postgres postgres      6 May 29 02:42 pg_tblspc
drwx------. 2 postgres postgres      6 May 29 02:42 pg_twophase
-rw-------. 1 postgres postgres      3 May 29 02:42 PG_VERSION
drwx------. 3 postgres postgres     60 May 29 02:42 pg_wal
drwx------. 2 postgres postgres     18 May 29 02:42 pg_xact
-rw-------. 1 postgres postgres    372 May 29 02:42 postgresql.auto.conf
-rw-------. 1 postgres postgres  28724 May 29 02:42 postgresql.conf
-rw-------. 1 postgres postgres      0 May 29 02:42 standby.signal

Step 7: Check the replication slot

Now connect the master server, you should be able to see the replication slot called book1 when you open the pg_replication_slots view as follows.

mydb=# select * from pg_replication_slots ;
-[ RECORD 1 ]-------+----------
slot_name           | book1
plugin              | 
slot_type           | physical
datoid              | 
database            | 
temporary           | f
active              | t
active_pid          | 3506
xmin                | 
catalog_xmin        | 
restart_lsn         | 0/3018E38
confirmed_flush_lsn | 
wal_status          | reserved
safe_wal_size       | 
two_phase           | f

#Now start PostgreSQL on a slave(standby) server.

Step 8: Check the status on Standby

We can check the status on standby using the below command.

mydb=# select * from pg_stat_wal_receiver ;
-[ RECORD 1 ]---------+------------------------------------------------------
pid                   | 3487
status                | streaming
receive_start_lsn     | 0/3000000
receive_start_tli     | 1
written_lsn           | 0/3018E38
flushed_lsn           | 0/3018E38
received_tli          | 1
last_msg_send_time    | 2024-05-29 07:15:34.249873+00
last_msg_receipt_time | 2024-05-29 07:15:34.376013+00
latest_end_lsn        | 0/3018E38
latest_end_time       | 2024-05-29 03:32:03.18865+00
slot_name             | book1
sender_host           | 10.206.0.2
sender_port           | 5433
conninfo              | user=replica passfile=/var/lib/pgsql/.pgpass channel_binding=prefer dbname=replication host=10.206.0.2 port=5433 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any

Conclusion:

Setting up PostgreSQL Streaming Replication for the first time was both a challenging and rewarding experience. This journey taught me the importance of meticulous planning, a thorough understanding of the underlying technology, and the ability to troubleshoot issues as they arise. Here are some key takeaways from my experience:

Understanding the Basics: Gaining a solid grasp of how streaming replication works is crucial. Knowing the differences between synchronous and asynchronous replication, and when to use each, can make a significant impact on your setup’s reliability and performance.
Proper Configuration: The configuration of the primary and standby servers is the backbone of a successful replication setup. Ensuring that parameters like wal_level, max_wal_senders, and archive_mode are correctly set is vital.
Monitoring and Maintenance: Implementing monitoring tools to track replication status and performance is critical. Regular maintenance, such as vacuuming and analyzing databases, helps keep the replication environment healthy.
Testing Failover: Regularly testing failover procedures ensures that the replication setup is reliable and that you can quickly recover in case of a primary server failure. This practice builds confidence in the system’s resilience. I’ll be covering this topic in the next blog post.

My first foray into PostgreSQL Streaming Replication was a significant learning curve, but it ultimately reinforced my skills and confidence in managing high-availability database systems. I hope this blog post has provided you with a comprehensive overview and practical insights into setting up streaming replication. Whether you are a novice or a seasoned DBA, these lessons can guide you through your own replication projects, ensuring a more resilient and reliable PostgreSQL environment.

Happy replicating!