Production-Ready Selective Logical Replication: Non-Partitioned to Partitioned Tables in PostgreSQL

In modern database architectures, it’s common to maintain separate environments for transactional operations and analytical workloads. One challenging scenario is setting up selective logical replication from a primary server with a non-partitioned table to a standby analytics server with a partitioned table, while maintaining different data retention policies.

In this article, we’ll explore how to implement selective logical replication between the source table which is non-partitioned and the target table which is partitioned, with deletion operations excluded from replication.

The Use Case

Our scenario involves:

Primary Server: Maintains 3 months of transactional data in a non-partitioned table.
Analytics Server: Retains 1 year of data in a partitioned table for analytics.
Replication Strategy: Only INSERT and UPDATE operations replicate; DELETE operations are excluded.

This approach ensures that the Analytics server retains historical data for reporting purposes while the primary server maintains only recent transactions for optimal performance.

Implementation Steps

1. Creating the Non-Partitioned Table on Primary Server

On the primary server, we create a standard non-partitioned table:

CREATE TABLE public.orders (
    order_id bigint NOT NULL,
    customer_id bigint NOT NULL,
    order_status varchar(50),
    order_date timestamp NOT NULL,
    total_amount numeric(12,2),
    shipping_address text,
    CONSTRAINT orders_pkey PRIMARY KEY (order_id)
);

Let’s insert some sample data:

#Inserting only 5 records for testing purpose
INSERT INTO orders VALUES 
(1001, 5001, 'COMPLETED', '2025-01-05 10:30:00', 299.99, '123 Main St, New York'),
(1002, 5002, 'PENDING', '2025-01-05 11:15:00', 149.50, '456 Oak Ave, Boston'),
(1003, 5003, 'SHIPPED', '2025-01-05 12:00:00', 599.00, '789 Pine Rd, Chicago'),
(1004, 5001, 'COMPLETED', '2025-01-05 14:30:00', 89.99, '123 Main St, New York'),
(1005, 5004, 'PROCESSING', '2025-01-05 15:45:00', 1299.00, '321 Elm St, Seattle');

2. Creating the Partitioned Table on Analytics Server

On the Analytics server, we create the same table structure but with partitioning by range on the order_date column:

CREATE TABLE public.orders (
    order_id bigint NOT NULL,
    customer_id bigint NOT NULL,
    order_status varchar(50),
    order_date timestamp NOT NULL,
    total_amount numeric(12,2),
    shipping_address text,
    CONSTRAINT orders_pkey PRIMARY KEY (order_id, order_date)
) PARTITION BY RANGE (order_date);

Create partitions for data management:

-- Create monthly partitions
CREATE TABLE orders_2025_12 PARTITION OF orders
    FOR VALUES FROM ('2025-12-01') TO ('2026-01-01');

CREATE TABLE orders_2026_01 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE orders_2026_02 PARTITION OF orders
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

Note: The partition key (order_date) must be included in the primary key constraint. This is a critical requirement for logical replication to work correctly between non-partitioned and partitioned tables.

3. Setting Replica Identity on Primary Server

For replication between non-partitioned and partitioned tables, we must set the replica identity to FULL:

ALTER TABLE orders REPLICA IDENTITY FULL;

Verify the setting

SELECT relname, relreplident 
FROM pg_class 
WHERE relname = 'orders';

The relreplident column should show f for FULL.

Why replica identity FULL?

When replicating to a partitioned table where the primary key includes the partition key, PostgreSQL needs complete row information to correctly identify and route updates to the appropriate partition.

4. Creating the Publication

Create a publication for the orders table and configure it to exclude DELETE operations:

#Create the publication
CREATE PUBLICATION orders_pub FOR TABLE orders;

#Configure to replicate only INSERT and UPDATE
ALTER PUBLICATION orders_pub SET (publish = 'insert,update');

Verify the configuration

SELECT pubname, pubinsert, pubupdate, pubdelete, pubtruncate 
FROM pg_publication 
WHERE pubname = 'orders_pub';

Output

 pubname    | pubinsert | pubupdate | pubdelete | pubtruncate 
------------+-----------+-----------+-----------+-------------
 orders_pub | t         | t         | f         | f

5. Creating the Subscription on Analytics Server

On the Analytics server, create the subscription:

CREATE SUBSCRIPTION orders_sub 
CONNECTION 'host=<Primary_IP_addr> port=<port> user=<replication_user> password=<password> dbname=<db_name>' 
PUBLICATION orders_pub 
WITH (copy_data = true);

Use copy_data = true for initial data synchronization. If data already exists on the analytics server, use copy_data = false.

Verify the subscription status

SELECT subname, subenabled, subpublications 
FROM pg_subscription 
WHERE subname = 'orders_sub';

Testing the Replication

Initial State Verification

Primary Server:

SELECT order_id, customer_id, order_status, order_date, total_amount 
FROM orders 
ORDER BY order_id;

 order_id | customer_id | order_status |      order_date      | total_amount 
----------+-------------+--------------+----------------------+--------------
     1001 |        5001 | COMPLETED    | 2025-01-05 10:30:00  |       299.99
     1002 |        5002 | PENDING      | 2025-01-05 11:15:00  |       149.50
     1003 |        5003 | SHIPPED      | 2025-01-05 12:00:00  |       599.00
     1004 |        5001 | COMPLETED    | 2025-01-05 14:30:00  |        89.99
     1005 |        5004 | PROCESSING   | 2025-01-05 15:45:00  |      1299.00

Analytics Server:

SELECT order_id, customer_id, order_status, order_date, total_amount 
FROM orders 
ORDER BY order_id;

The data should match the primary server after initial synchronization.

Scenario 1: INSERT Operations

Insert a new order on the primary server:

INSERT INTO orders VALUES 
(1006, 5005, 'PENDING', '2025-01-06 09:00:00', 449.99, '555 Beach Blvd, Miami');

Primary Server Query:

SELECT order_id, customer_id, order_status, total_amount 
FROM orders 
WHERE order_id = 1006;

 order_id | customer_id | order_status | total_amount 
----------+-------------+--------------+--------------
     1006 |        5005 | PENDING      |       449.99

Analytics Server Query:

SELECT order_id, customer_id, order_status, total_amount 
FROM orders 
WHERE order_id = 1006;

 order_id | customer_id | order_status | total_amount 
----------+-------------+--------------+--------------
     1006 |        5005 | PENDING      |       449.99

Result: The record automatically appears on the Analytics server, demonstrating successful INSERT replication.

Scenario 2: UPDATE Operations

Update an order status on the primary:

UPDATE orders 
SET order_status = 'SHIPPED', total_amount = 459.99 
WHERE order_id = 1006;

Primary Server Query:

SELECT order_id, order_status, total_amount 
FROM orders 
WHERE order_id = 1006;

 order_id | order_status | total_amount 
----------+--------------+--------------
     1006 | SHIPPED      |       459.99

Analytics Server Query:

SELECT order_id, order_status, total_amount 
FROM orders 
WHERE order_id = 1006;

 order_id | order_status | total_amount 
----------+--------------+--------------
     1006 | SHIPPED      |       459.99

Result: The change replicates to the Analytics server, confirming UPDATE replication works as expected.

Scenario 3: DELETE Operations

Delete an order from the primary:

DELETE FROM orders WHERE order_id = 1002;

Primary Server Query:

SELECT order_id, customer_id, order_status 
FROM orders 
WHERE order_id = 1002;

 order_id | customer_id | order_status 
----------+-------------+--------------
(0 rows)

Analytics Server Query

SELECT order_id, customer_id, order_status 
FROM orders 
WHERE order_id = 1002;

 order_id | customer_id | order_status 
----------+-------------+--------------
     1002 |        5002 | PENDING
(1 row)

Result: The record is deleted from the primary but remains on the analytics server. This is the intended behavior since we excluded DELETE from the publication, allowing the analytics server to retain historical data for analytics and compliance.

Critical Considerations

1. Primary Key Requirements

When replicating to a partitioned table, the partition key must be part of the primary key. In our example:

Primary Server: PRIMARY KEY (order_id)
Analytics Server: PRIMARY KEY (order_id, order_date)

This composite key ensures PostgreSQL can route updates to the correct partition on the Analytics server.

2. Replica Identity FULL

Setting replica identity to FULL is mandatory for this setup. Without it, you’ll encounter errors:

ERROR: publisher did not send replica identity column expected by the 
logical replication target relation "public.orders"

However, REPLICA IDENTITY FULL comes with a performance overhead:

Replica Identity	Data Logged	Performance Impact
DEFAULT	Only primary key columns	Low overhead
FULL	All columns in the row	Higher WAL volume, increased I/O

3. Performance Impact

The replica identity FULL setting means:

Increased WAL Volume: Each UPDATE logs all column values (not just changed ones)
Higher Network Bandwidth: More data transmitted during replication
Additional I/O Overhead: More disk writes on the primary server

Example: Updating a single column in a 20-column table will log all 20 columns with REPLICA IDENTITY FULL, versus just the primary key with DEFAULT.

4. Data Consistency Considerations

Due to different retention policies and excluded DELETE operations, data divergence is expected:

Aspect	Primary Server	Analytics Server
Data Retention	3 months	1 year
DELETE Operations	Executed	Not replicated
Use Case	Transactional	Analytics/Reporting

The Analytics server will contain orders that have been purged from the primary server, which is the desired outcome for historical reporting and compliance.

5. Partition Management

Pre-create partitions on the Analytics server to avoid errors:

#Create future partitions in advance
CREATE TABLE orders_2026_03 PARTITION OF orders
    FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');

CREATE TABLE orders_2026_04 PARTITION OF orders
    FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');

If a partition doesn’t exist when data arrives, replication will fail with:

ERROR: no partition of relation "orders" found for row

Monitoring and Troubleshooting

Monitor Replication Health

Check replication lag on Primary server:

SELECT client_addr, state, sync_state, 
       pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) AS send_lag,
       pg_wal_lsn_diff(sent_lsn, write_lsn) AS write_lag
FROM pg_stat_replication;

Check subscription status on Analytics server:

SELECT subname, pid, received_lsn, latest_end_lsn, 
       latest_end_time, last_msg_receipt_time
FROM pg_stat_subscription;

Common Issues and Solutions

Issue 1: Replication slot growing indefinitely

Cause: Subscription disabled or connection issues
Solution: Monitor pg_replication_slots and drop unused slots.

Issue 2: Missing partitions

Cause: New data arrives for a non-existent partition
Solution: Create partitions proactively or use default partition

Issue 3: High replication lag

Cause: REPLICA IDENTITY FULL overhead or network latency
Solution: Optimize network, increase replication workers, or reconsider full identity necessity

Best Practices

Monitor Replication Lag: Set up alerts when lag exceeds acceptable thresholds.
Automate Partition Creation: Use cron jobs or pg_cron to create future partitions.
Document Retention Policies: Maintain clear documentation of data divergence expectations.
Test Failover Scenarios: Regularly test subscription recovery after interruptions.
Benchmark Performance: Measure WAL generation before and after enabling REPLICA IDENTITY FULL.
Regular Audits: Periodically verify critical business records exist on both servers.

Conclusion

Selective logical replication from non-partitioned to partitioned tables in PostgreSQL provides a powerful solution for maintaining separate transactional and analytical databases with different retention policies. While the setup requires careful attention to replica identity settings, primary key constraints, and partition management, it enables organizations to optimize both operational performance and historical data retention.

The key to success lies in:

Understanding the performance implications of REPLICA IDENTITY FULL
Proactive partition management on the target server.
Clear documentation of data retention differences.
Regular monitoring of replication health.

With proper configuration and monitoring, this approach can significantly enhance your data architecture’s flexibility and efficiency, allowing you to maintain a lean operational database while preserving rich historical data for analytics and compliance.

Have you implemented similar replication strategies in your PostgreSQL environment? What challenges did you face with partitioned targets? Share your experiences in the comments below!

Those who are beginner can check out this post for understanding the concepts of Replication and Partitions.

The Use Case

Implementation Steps

1. Creating the Non-Partitioned Table on Primary Server

2. Creating the Partitioned Table on Analytics Server

3. Setting Replica Identity on Primary Server

Verify the setting

Why replica identity FULL?

4. Creating the Publication

Verify the configuration

Output

5. Creating the Subscription on Analytics Server

Verify the subscription status

Testing the Replication

Initial State Verification

Scenario 1: INSERT Operations

Primary Server Query:

Analytics Server Query:

Scenario 2: UPDATE Operations

Primary Server Query:

Analytics Server Query:

Scenario 3: DELETE Operations

Primary Server Query:

Analytics Server Query

Critical Considerations

1. Primary Key Requirements

2. Replica Identity FULL

3. Performance Impact

4. Data Consistency Considerations

5. Partition Management

Monitoring and Troubleshooting

Monitor Replication Health

Common Issues and Solutions

Best Practices

Conclusion

Leave a Comment