PostgreSQL’s logical replication is handled through publications and subscriptions. Publications define what data to replicate, while subscriptions specify which server receives the updates.
Initial synchronization ensures the subscriber has all the data. Common methods include:
Snapshot: Copies all data at once. Simple but slow for large datasets.
Base Backup and Continuous Archiving: Makes a base copy and then tracks changes. More suitable for large datasets.
However, when dealing with massive databases in RDS PostgreSQL, such as those in the terabyte range, using pg_basebackup for base backup might not work due to security restrictions on file systems. pg_basebackup requires deep file system access, which is limited in RDS for security reasons. In such cases, replication needs to be set up using the pg_dump utility instead.
Before proceeding, ensure the following configuration parameters are set and restart the cluster:
max_replication_slots: 10 max_wal_senders: 10 track_commit_timestamp: on wal_level: logical
Next, create a publication and add tables to it in the source database:
CREATE PUBLICATION lsr_sync_01;
ALTER PUBLICATION lsr_sync_01 ADD TABLE sales_data;
ALTER PUBLICATION lsr_sync_01 ADD TABLE customer_info;
ALTER PUBLICATION lsr_sync_01 ADD TABLE transactions;
Create a replication slot:
To create a replication connection to the database and establish a replication slot, follow these steps:
Open a separate terminal session.
Keep the session active, preferably using a screen session.
Run the following command:
[postgres@primary ~]$ psql -h primary -U repmgr "dbname=osdb_lsr replication=database"
psql (10.23, server 14.10)
WARNING: psql major version 10, server major version 14.
Some psql features might not work.
Type "help" for help.
osdb_lsr=# CREATE_REPLICATION_SLOT lsr_sync_01 LOGICAL pgoutput;
slot_name | consistent_point | snapshot_name | output_plugin
-------------+------------------+---------------------+---------------
lsr_sync_01 | 0/C000110 | 00000003-00000002-1 | pgoutput
(1 row)
Now let's proceed with backing up the database using pg_dump.
pg_dump -h primary -U repmgr -t sales_data -t customer_info -t transactions -Fd -f osdb_lsr_intial_sync -j 2 --no-publications --no-subscriptions --snapshot=00000003-00000002-1 --compress=0 -v osdb_lsr
Restore the backup on the target for LSR:
pg_restore -h standby -U repmgr -Fd -j 2 -v -d osdb_lsr osdb_lsr_intial_sync
Create a subscription:
Create the subscription on the standby server, referring to the primary database, but keep it in a disabled state.
CREATE SUBSCRIPTION lsr_sync_sub_01 CONNECTION 'host=primary port=5432 dbname=osdb_lsr user=repmgr password=*****' PUBLICATION lsr_sync_01
WITH (
copy_data = false,
create_slot = false,
enabled = false,
connect = true,
slot_name = 'lsr_sync_01'
);
Advance the replication origin:
Retrieve the external ID from the pg_subscription table and use it to advance the replay position to the consistent point captured during replication slot creation.
standby=# SELECT 'pg_'||oid::text AS "external_id" FROM pg_subscription WHERE subname = 'lsr_sync_sub_01';
external_id
-------------
pg_32119
(1 row)
Take the position of the LSN from the replication slot creation and make it advance to start the replication
standby=# SELECT pg_replication_origin_advance('pg_32119', '0/C000110') ;
pg_replication_origin_advance
-------------------------------
(1 row)
Enable the subscription:
ALTER SUBSCRIPTION lsr_sync_sub_01 ENABLE;
Check the replication status:
SELECT * FROM pg_stat_replication;
This series of steps helps to establish logical replication in PostgreSQL, ensuring data synchronization between databases efficiently.