Christophe Pettus: Failover Slots, Two Years On

Christophe Pettus: Failover Slots, Two Years On

Planet PostgreSQL
Planet PostgreSQLMay 4, 2026

Why It Matters

Failover slots close a long‑standing replication gap, ensuring logical subscribers stay consistent after a failover, which is critical for high‑availability PostgreSQL deployments.

Key Takeaways

  • PostgreSQL 17 adds failover slots, sync_replication_slots, synchronized_standby_slots.
  • Slots sync asynchronously; manual pg_sync_replication_slots needed before promotion.
  • Logical changes now wait for physical standby, adding latency to throughput.
  • PostgreSQL 19 introduces dynamic wal_level and slotsync_skip_reason column.
  • Enable failover slots with runbook updates for safe production use.

Pulse Analysis

The separation of logical replication and physical streaming has been a pain point for PostgreSQL users who need both high‑availability and real‑time data distribution. Prior to version 17, a primary that failed left logical subscribers without a slot, breaking ordering guarantees and forcing costly manual recovery. By introducing a failover flag on logical slots and a background worker that copies slot state to standbys, PostgreSQL now guarantees that a promoted standby can resume logical replication without data loss, albeit with an asynchronous copy that must be verified before promotion.

Operationally, the new machinery creates a coupling between the two replication streams. The primary will not advance a logical slot until every physical standby listed in `synchronized_standby_slots` has flushed the corresponding WAL. This safety net eliminates data loss but introduces additional latency, especially when standby performance degrades. DBAs must monitor both WAL lag and slot sync status, and adjust failover automation—such as Patroni scripts—to call `pg_sync_replication_slots()` and confirm slot positions before triggering a promotion. Ignoring these steps can leave a system correct on average but vulnerable in the rare failure window.

PostgreSQL 19 refines the experience with a dynamic `wal_level` that automatically switches between `replica` and `logical` based on slot existence, removing a disruptive restart requirement. The new `slotsync_skip_reason` column surfaces why a slot sync failed, turning opaque log messages into actionable diagnostics. Together with the EXPLAIN IO option, administrators gain visibility into the throughput penalty imposed by the hold‑back mechanism. For organizations running mixed replication topologies, the prudent path is to enable failover slots now, update runbooks, and plan to upgrade to 19 for the friction‑free enhancements.

Christophe Pettus: Failover Slots, Two Years On

Comments

Want to join the conversation?

Loading comments...