Inside Patroni’s replicatefrom: How Cascading replica self-heal when their upstream fails

If you run PostgreSQL at scale with Patroni, you’ve probably hit the point where every replica streaming directly from the primary starts to strain it β€” too many WAL senders, too much network fan-out, too much load on a node that’s supposed to be focused on writes. Answer to this is cascading replication, configured through the replicatefrom tag in Patroni. It’s simple on paper, but it has some non-obvious failover behavior β€” and once you understand it, the things you observe in production stop looking like bugs and start looking like the system working exactly as designed.

This post covers the concept, the failure/recovery behavior, and then walks through it live using patronictl list.

What cascading replication solves?

In a normal Patroni cluster, every replica connects directly to the primary’s WAL stream. That’s fine for 3–5 nodes, but it doesn’t scale indefinitely, and it doesn’t help if you have replicas in a different rack, region, or network segment that you’d rather have replicate from a “local” node instead of crossing a WAN link to the primary every time.

Cascading replication lets you build a topology like:

node1 --> leader 
node2 --> replica --> upstreaming to node1
node3 --> replica --> upstreaming to node1
node4 --> replica --> upstreaming to node3 #Cascading replica

Instead of cascading_replica connecting to the primary, it connects to node3. node3 itself is replicating from the chain, and simply re-streams that WAL onward. This reduces load on the primary and lets you shape the replication topology to match your network layout.

The replicatefrom tag

You configure this per-node, in the tags section of that node’s Patroni config. The following is an excerpt from the Patroni configuration file on node4 from our example topology above:

tags:
  replicatefrom: node3
  nofailover: true
  noloadbalance: true 

This tells Patroni when building this node’s primary_conninfo, point it at node3‘s connection info instead of the cluster leader’s. Patroni rewrites the recovery configuration accordingly, and the replication slot for the cascading replica gets created and tracked on its immediate upstream (node3), not on the primary.

nofailover and noloadbalance are common companions here β€” you usually don’t want a cascading-only replica promoted, or hit by read-traffic load balancers, since its job is purely to offload replication fan-out.

It’s a single hop’s worth of indirection per node β€” you can chain it further, but each node only knows about its own immediate upstream, not the full topology.

What happens when the upstream node fails

This is the behavior most people trip over the first time they see it.

When Patroni evaluates where a replica should stream from, it checks whether the node named in replicatefrom is actually present and healthy in the cluster. If it isn’t β€” say node3 is down β€” Patroni doesn’t leave the cascading replica dangling with a dead connection. It falls back to streaming directly from the current cluster leader instead.

This is deliberate, not accidental. The Patroni maintainers confirmed exactly this scenario on the project’s issue tracker: if a node configured as a replication source is lost, Patroni’s default fallback is to use the leader instead, and that’s intentionally how the feature is programmed to work. GitHub

So for a 3-node cluster plus one cascading replica pointed at node3:

  1. Steady state: the cascading replica streams from node3.
  2. node3 fails: Patroni notices it’s no longer a valid, healthy upstream and, on the next HA loop iteration, rewrites the replica’s primary_conninfo to point at the current leader. Replication continues uninterrupted, just without the cascade hop.
  3. node3 recovers and catches up: once it’s healthy again, Patroni re-evaluates the topology and because the tag still says replicatefrom: node3 switches the cascading replica back to it.

So automatic fallback to the leader on failure, and automatic re-pointing back on recovery, isn’t a misconfiguration β€” it’s Patroni protecting the cascading replica from going stale or losing its replication source entirely.

Caveats worth knowing

  • Replication slots move too. Since slots for cascading replicas live on the immediate upstream, switching sources shifts slot management as well. If the upstream was down for a long stretch, check your WAL retention settings so slots aren’t dropped prematurely.
  • No flapping protection. If the upstream node is unstable, the cascading replica will follow it back and forth each time, driven by the same loop_wait cadence as the rest of the HA loop. There’s no dampening built in.
  • No hard isolation guarantee. If you need a replica to only ever stream from a specific node β€” never the leader, for network isolation reasons β€” replicatefrom alone can’t enforce that. The fallback-to-leader behavior isn’t configurable off.

Failover

Open the video alongside this post to see the what changes at each stage of the failover.

Conclusion

If your goal is “reduce load on the primary, but always keep a working replication path,” this is the feature working as intended β€” graceful degradation to leader-streaming, and automatic restoration of the cascade once the upstream is healthy. patronictl list will show you declared intent (the Tags column) and overall cluster health, but to confirm actual live topology during a transition, cross-check pg_stat_replication on the relevant nodes β€” that’s the only way to see the real connection, not just the configured preference.

If your requirement is strict topology isolation β€” a replica that must never fall back to the leader β€” replicatefrom won’t give you that guarantee on its own, and you’d need to enforce it outside Patroni (e.g., network-level access rules preventing that replica from reaching the primary at all).

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top