In modern database architectures, it’s common to maintain separate environments for transactional operations and analytical workloads. One challenging scenario is setting up selective logical replication from a primary server with a non-partitioned table to a standby analytics server with a partitioned table, while maintaining different data retention policies.
In this article, we’ll explore how to implement selective logical replication between the source table which is non-partitioned and the target table which is partitioned, with deletion operations excluded from replication.
The Use Case
Our scenario involves:
- Primary Server: Maintains 3 months of transactional data in a non-partitioned table.
- Analytics Server: Retains 1 year of data in a partitioned table for analytics.
- Replication Strategy: Only
INSERTandUPDATEoperations replicate;DELETEoperations are excluded.
This approach ensures that the Analytics server retains historical data for reporting purposes while the primary server maintains only recent transactions for optimal performance.
Implementation Steps
1. Creating the Non-Partitioned Table on Primary Server
On the primary server, we create a standard non-partitioned table:
CREATE TABLE public.orders (
order_id bigint NOT NULL,
customer_id bigint NOT NULL,
order_status varchar(50),
order_date timestamp NOT NULL,
total_amount numeric(12,2),
shipping_address text,
CONSTRAINT orders_pkey PRIMARY KEY (order_id)
);
Let’s insert some sample data:
#Inserting only 5 records for testing purpose
INSERT INTO orders VALUES
(1001, 5001, 'COMPLETED', '2025-01-05 10:30:00', 299.99, '123 Main St, New York'),
(1002, 5002, 'PENDING', '2025-01-05 11:15:00', 149.50, '456 Oak Ave, Boston'),
(1003, 5003, 'SHIPPED', '2025-01-05 12:00:00', 599.00, '789 Pine Rd, Chicago'),
(1004, 5001, 'COMPLETED', '2025-01-05 14:30:00', 89.99, '123 Main St, New York'),
(1005, 5004, 'PROCESSING', '2025-01-05 15:45:00', 1299.00, '321 Elm St, Seattle');
2. Creating the Partitioned Table on Analytics Server
On the Analytics server, we create the same table structure but with partitioning by range on the order_date column:
CREATE TABLE public.orders (
order_id bigint NOT NULL,
customer_id bigint NOT NULL,
order_status varchar(50),
order_date timestamp NOT NULL,
total_amount numeric(12,2),
shipping_address text,
CONSTRAINT orders_pkey PRIMARY KEY (order_id, order_date)
) PARTITION BY RANGE (order_date);
Create partitions for data management:
-- Create monthly partitions
CREATE TABLE orders_2025_12 PARTITION OF orders
FOR VALUES FROM ('2025-12-01') TO ('2026-01-01');
CREATE TABLE orders_2026_01 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE orders_2026_02 PARTITION OF orders
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
Note: The partition key (order_date) must be included in the primary key constraint. This is a critical requirement for logical replication to work correctly between non-partitioned and partitioned tables.
3. Setting Replica Identity on Primary Server
For replication between non-partitioned and partitioned tables, we must set the replica identity to FULL:
ALTER TABLE orders REPLICA IDENTITY FULL;
Verify the setting
SELECT relname, relreplident
FROM pg_class
WHERE relname = 'orders';
The relreplident column should show f for FULL.
Why replica identity FULL?
When replicating to a partitioned table where the primary key includes the partition key, PostgreSQL needs complete row information to correctly identify and route updates to the appropriate partition.
4. Creating the Publication
Create a publication for the orders table and configure it to exclude DELETE operations:
#Create the publication
CREATE PUBLICATION orders_pub FOR TABLE orders;
#Configure to replicate only INSERT and UPDATE
ALTER PUBLICATION orders_pub SET (publish = 'insert,update');
Verify the configuration
SELECT pubname, pubinsert, pubupdate, pubdelete, pubtruncate
FROM pg_publication
WHERE pubname = 'orders_pub';
Output
pubname | pubinsert | pubupdate | pubdelete | pubtruncate
------------+-----------+-----------+-----------+-------------
orders_pub | t | t | f | f
5. Creating the Subscription on Analytics Server
On the Analytics server, create the subscription:
CREATE SUBSCRIPTION orders_sub
CONNECTION 'host=<Primary_IP_addr> port=<port> user=<replication_user> password=<password> dbname=<db_name>'
PUBLICATION orders_pub
WITH (copy_data = true);
Use copy_data = true for initial data synchronization. If data already exists on the analytics server, use copy_data = false.
Verify the subscription status
SELECT subname, subenabled, subpublications
FROM pg_subscription
WHERE subname = 'orders_sub';
Testing the Replication
Initial State Verification
Primary Server:
SELECT order_id, customer_id, order_status, order_date, total_amount
FROM orders
ORDER BY order_id;
order_id | customer_id | order_status | order_date | total_amount
----------+-------------+--------------+----------------------+--------------
1001 | 5001 | COMPLETED | 2025-01-05 10:30:00 | 299.99
1002 | 5002 | PENDING | 2025-01-05 11:15:00 | 149.50
1003 | 5003 | SHIPPED | 2025-01-05 12:00:00 | 599.00
1004 | 5001 | COMPLETED | 2025-01-05 14:30:00 | 89.99
1005 | 5004 | PROCESSING | 2025-01-05 15:45:00 | 1299.00
Analytics Server:
SELECT order_id, customer_id, order_status, order_date, total_amount
FROM orders
ORDER BY order_id;
The data should match the primary server after initial synchronization.
Scenario 1: INSERT Operations
Insert a new order on the primary server:
INSERT INTO orders VALUES
(1006, 5005, 'PENDING', '2025-01-06 09:00:00', 449.99, '555 Beach Blvd, Miami');
Primary Server Query:
SELECT order_id, customer_id, order_status, total_amount
FROM orders
WHERE order_id = 1006;
order_id | customer_id | order_status | total_amount
----------+-------------+--------------+--------------
1006 | 5005 | PENDING | 449.99
Analytics Server Query:
SELECT order_id, customer_id, order_status, total_amount
FROM orders
WHERE order_id = 1006;
order_id | customer_id | order_status | total_amount
----------+-------------+--------------+--------------
1006 | 5005 | PENDING | 449.99
Result: The record automatically appears on the Analytics server, demonstrating successful INSERT replication.
Scenario 2: UPDATE Operations
Update an order status on the primary:
UPDATE orders
SET order_status = 'SHIPPED', total_amount = 459.99
WHERE order_id = 1006;
Primary Server Query:
SELECT order_id, order_status, total_amount
FROM orders
WHERE order_id = 1006;
order_id | order_status | total_amount
----------+--------------+--------------
1006 | SHIPPED | 459.99
Analytics Server Query:
SELECT order_id, order_status, total_amount
FROM orders
WHERE order_id = 1006;
order_id | order_status | total_amount
----------+--------------+--------------
1006 | SHIPPED | 459.99
Result: The change replicates to the Analytics server, confirming UPDATE replication works as expected.
Scenario 3: DELETE Operations
Delete an order from the primary:
DELETE FROM orders WHERE order_id = 1002;
Primary Server Query:
SELECT order_id, customer_id, order_status
FROM orders
WHERE order_id = 1002;
order_id | customer_id | order_status
----------+-------------+--------------
(0 rows)
Analytics Server Query
SELECT order_id, customer_id, order_status
FROM orders
WHERE order_id = 1002;
order_id | customer_id | order_status
----------+-------------+--------------
1002 | 5002 | PENDING
(1 row)
Result: The record is deleted from the primary but remains on the analytics server. This is the intended behavior since we excluded DELETE from the publication, allowing the analytics server to retain historical data for analytics and compliance.
Critical Considerations
1. Primary Key Requirements
When replicating to a partitioned table, the partition key must be part of the primary key. In our example:
- Primary Server:
PRIMARY KEY (order_id) - Analytics Server:
PRIMARY KEY (order_id, order_date)
This composite key ensures PostgreSQL can route updates to the correct partition on the Analytics server.
2. Replica Identity FULL
Setting replica identity to FULL is mandatory for this setup. Without it, you’ll encounter errors:
ERROR: publisher did not send replica identity column expected by the
logical replication target relation "public.orders"
However, REPLICA IDENTITY FULL comes with a performance overhead:
| Replica Identity | Data Logged | Performance Impact |
|---|---|---|
| DEFAULT | Only primary key columns | Low overhead |
| FULL | All columns in the row | Higher WAL volume, increased I/O |
3. Performance Impact
The replica identity FULL setting means:
- Increased WAL Volume: Each UPDATE logs all column values (not just changed ones)
- Higher Network Bandwidth: More data transmitted during replication
- Additional I/O Overhead: More disk writes on the primary server
Example: Updating a single column in a 20-column table will log all 20 columns with REPLICA IDENTITY FULL, versus just the primary key with DEFAULT.
4. Data Consistency Considerations
Due to different retention policies and excluded DELETE operations, data divergence is expected:
| Aspect | Primary Server | Analytics Server |
|---|---|---|
| Data Retention | 3 months | 1 year |
| DELETE Operations | Executed | Not replicated |
| Use Case | Transactional | Analytics/Reporting |
The Analytics server will contain orders that have been purged from the primary server, which is the desired outcome for historical reporting and compliance.
5. Partition Management
Pre-create partitions on the Analytics server to avoid errors:
#Create future partitions in advance
CREATE TABLE orders_2026_03 PARTITION OF orders
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
CREATE TABLE orders_2026_04 PARTITION OF orders
FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');
If a partition doesn’t exist when data arrives, replication will fail with:
ERROR: no partition of relation "orders" found for row
Monitoring and Troubleshooting
Monitor Replication Health
Check replication lag on Primary server:
SELECT client_addr, state, sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) AS send_lag,
pg_wal_lsn_diff(sent_lsn, write_lsn) AS write_lag
FROM pg_stat_replication;
Check subscription status on Analytics server:
SELECT subname, pid, received_lsn, latest_end_lsn,
latest_end_time, last_msg_receipt_time
FROM pg_stat_subscription;
Common Issues and Solutions
Issue 1: Replication slot growing indefinitely
- Cause: Subscription disabled or connection issues
- Solution: Monitor
pg_replication_slotsand drop unused slots.
Issue 2: Missing partitions
- Cause: New data arrives for a non-existent partition
- Solution: Create partitions proactively or use default partition
Issue 3: High replication lag
- Cause:
REPLICA IDENTITY FULLoverhead or network latency - Solution: Optimize network, increase replication workers, or reconsider full identity necessity
Best Practices
- Monitor Replication Lag: Set up alerts when lag exceeds acceptable thresholds.
- Automate Partition Creation: Use cron jobs or
pg_cronto create future partitions. - Document Retention Policies: Maintain clear documentation of data divergence expectations.
- Test Failover Scenarios: Regularly test subscription recovery after interruptions.
- Benchmark Performance: Measure WAL generation before and after enabling
REPLICA IDENTITY FULL. - Regular Audits: Periodically verify critical business records exist on both servers.
Conclusion
Selective logical replication from non-partitioned to partitioned tables in PostgreSQL provides a powerful solution for maintaining separate transactional and analytical databases with different retention policies. While the setup requires careful attention to replica identity settings, primary key constraints, and partition management, it enables organizations to optimize both operational performance and historical data retention.
The key to success lies in:
- Understanding the performance implications of
REPLICA IDENTITY FULL - Proactive partition management on the target server.
- Clear documentation of data retention differences.
- Regular monitoring of replication health.
With proper configuration and monitoring, this approach can significantly enhance your data architecture’s flexibility and efficiency, allowing you to maintain a lean operational database while preserving rich historical data for analytics and compliance.
Have you implemented similar replication strategies in your PostgreSQL environment? What challenges did you face with partitioned targets? Share your experiences in the comments below!
Those who are beginner can check out this post for understanding the concepts of Replication and Partitions.
