Setting Up Failover Slots in PostgreSQL-17
Introduction
PostgreSQL 17 introduces failover slots that enhance high-availability setups. A replication slot ensures that data remains reliable and consistent between nodes during replication, whereas a failover slot ensures consistency between nodes, specifically during and after a failover.
Failover slots are a powerful feature that ensures logical replication can continue seamlessly, even after a failover to a standby server. Using failover slots allows logical replication slots to be automatically synchronized across primary and standby nodes, significantly reducing downtime and the need for manual intervention during a failover.
This guide will walk you through setting up a high-availability PostgreSQL cluster using the new failover slots feature. By the end, you'll have a robust replication setup capable of seamlessly handling a failover.
Why Failover Slots Matter from a Historical Perspective
Challenges in PostgreSQL 15
Replication Slots Tied to the Primary Node: In PostgreSQL 15, replication slots were only created on the primary server. All logical replication slots were lost if the primary server failed, leading to significant replication delays and data loss.
Manual Failover Management: During failover scenarios, administrators manually recreated replication slots on the new primary server, which increased complexity, introduced errors, and prolonged downtime.
No Slot Synchronization: Standby servers had no way of knowing about logical replication slots on the primary. This lack of synchronization led to a complete reset of replication streams if a failover occurred.
Improvements in PostgreSQL 16:
Minimal Logical Decoding
PostgreSQL 16 introduced a feature called minimal logical decoding on standbys:
Minimal Decoding on Standby: This allowed standby servers to decode WAL logs to prepare for logical replication, enabling pre-warmed slots for use if a failover occurred.
Faster Failover: By pre-decoding WAL changes on the standby, it was possible to reduce replication lag when promoting a standby to the primary. However, this still required some manual configuration to ensure smooth failover.
PostgreSQL 17: The Game-Changer - Failover Slots
Failover Slots: Introducing failover slots in PostgreSQL 17 eliminates the need for manual intervention by automatically synchronizing logical replication slots between the primary and standby servers.
Automatic Synchronization: The new slot sync worker ensures that failover-enabled slots (failover = true) are always synchronized, even while the primary node is active.
Seamless Transition: Upon failover, the standby server can take over as the primary without losing any replication slots, ensuring zero data loss and continuous replication.
Feature | PostgreSQL 15 | PostgreSQL 16 | PostgreSQL 17 |
---|---|---|---|
Logical Replication | Yes | Yes | Yes |
Automatic Slot Synchronization | No | Minimal logical decoding on Standby | Full failover slots |
Failover Handling | Manual intervention needed | Pre-warmed slots on standby | Automatic failover slots |
Slot Synchronization to Standby | Not supported | Minimal, requires configuration | Automatic with slotsync worker |
High Availability for Logical Replication | Limited | Improved with minimal decoding | Seamless with failover slots |
Creating a High-Availability Cluster with Failover Slots
This section will walk you through creating a PostgreSQL high-availability cluster with failover slots. In our example, we'll use the following nodes:
NodeA (Primary Server)
NodeB (Physical Standby)
NodeC (Logical Subscriber)
Prerequisites
Before we start, ensure you have:
PostgreSQL 17 was installed on all three nodes.
Passwordless SSH access between each node.
A basic understanding of PostgreSQL, PostgreSQL replication, and PostgreSQL configuration files.
Step 1: Configuring the Primary Node (NodeA)
1.1 Initialize the cluster on NodeA
After installing PostgreSQL on the primary node, initialize the cluster; you can use the following commands:
mkdir -p /home/pgedge/nodeA
initdb -D /home/pgedge/nodeA --no-locale -E UTF8
pg_ctl -D /home/pgedge/nodeA -l /home/pgedge/logs/nodeA.log start
1.2 Configure replication in the postgresql.conf file
After initializing the cluster, edit the postgresql.conf file, located by default in /home/pgedge/nodeA/postgresql.conf
. Set the following parameter values:
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
synchronous_standby_names = '*'
synchronized_standby_slots = 'sb1_slot'
port = 5432
1.3 Update the pg_hba.conf file allowing for Replication Access
The pg_hba.conf file manages client authentication for the PostgreSQL server. Add the following entry to /home/pgedge/nodeA/pg_hba.conf to ensure access for a replication user:
host replication replicator 127.0.0.1/32 md5
Then, reload the configuration:
pg_ctl -D /home/pgedge/nodeA reload
1.4 Create a Replication User
Then, log into PostgreSQL and create the replication user:
psql -d postgres -p 5432
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'replicator_password';
1.5 Create a Table and Set Up a Publication
Next, you'll need to create a table and create an associated publication
CREATE TABLE foo (c1 INT PRIMARY KEY);
GRANT SELECT ON foo TO replicator;
CREATE PUBLICATION mypub FOR TABLE foo;
Step 2: Configuring the Physical Standby (NodeB)
1.1 Initialize NodeB
After installing PostgreSQL, initialize NodeB:
mkdir -p /home/pgedge/nodeB
initdb -D /home/pgedge/nodeB --no-locale -E UTF8
pg_ctl -D /home/pgedge/nodeB -l /home/pgedge/logs/nodeB.log start
2.1 Create a Base Backup
Then, use pg_basebackup to take a backup of the cluster:
mkdir -p /home/pgedge/nodeB
pg_basebackup -D /home/pgedge/nodeB -R -X stream -P -h localhost -p 5432 -U replicator
2.2 Configure postgresql.conf on Node-B
Modify the postgresql.conf file (located in /home/pgedge/nodeB/postgresql.conf), setting:
port = 5433
primary_conninfo = 'host=localhost port=5432 user=replicator password=replicator_password dbname=postgres application_name=sb1_slot'
primary_slot_name = 'sb1_slot'
hot_standby_feedback = on
sync_replication_slots = on
2.3 Enable Failover Slot Synchronization
Use the psql client to log in to NodeB:
psql -d postgres -p 5433
Then, use the following statements to configure replication for NodeB:
ALTER SYSTEM SET sync_replication_slots = on;
ALTER SYSTEM SET hot_standby_feedback = on;
ALTER SYSTEM SET synchronized_standby_slots = 'sb1_slot';
Exit the psql client and restart NodeB:
pg_ctl -D /home/pgedge/nodeB restart
2.4 Verify Slot Synchronization
Then, reconnect to NodeB with psql and verify that the slots are synchronized:
SELECT slot_name, failover, synced FROM pg_replication_slots;
Step 3: Setting Up the Logical Subscriber (NodeC)
3.1 Initialize the cluster and configure NodeC
After installing PostgreSQL, initialize the cluster; you can use the following commands:
mkdir -p /home/pgedge/nodeC
initdb -D /home/pgedge/nodeC --no-locale -E UTF8
Then, edit the /home/pgedge/nodeC/postgresql.conf
file, setting the following parameter values:
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
sync_replication_slots = on
port = 5444
After editing the configuration file, start NodeC:
pg_ctl -D /home/pgedge/nodeC -l /home/pgedge/logs/nodeC.log start
3.2 Create a Subscription on NodeC
Use the following command to create a subscription on NodeC:
CREATE SUBSCRIPTION foosub CONNECTION 'dbname=postgres host=localhost port=5432 user=replicator password=replicator_password' PUBLICATION mypub WITH (failover = true);
Step 4: Simulating Failover and Ensuring Continuity
You can use the following commands to simulate a failover to confirm that replication continues and data integrity is preserved.
4.1 Simulating a Failover
Use the following commands to simulate a failure of NodeA, followed by promotion from standby to primary of NodeB:
pg_ctl -D /home/pgedge/nodeA stop
pg_ctl -D /home/pgedge/nodeB promote
4.2 Update the Subscription on NodeC
After promoting nodeB, log in to NodeC and update the connection to reflect that NodeB is now the primary node:
ALTER SUBSCRIPTION foosub DISABLE;
ALTER SUBSCRIPTION foosub CONNECTION 'dbname=postgres host=localhost port=5433 user=replicator password=replicator_password';
ALTER SUBSCRIPTION foosub ENABLE;
4.3 Verify Data Continuity
To test replication, use psql to log in to Node-B (now the primary):
INSERT INTO foo VALUES (3), (4);
Check replication on Node-C:
SELECT * FROM foo;
Conclusion
PostgreSQL 17’s failover slot feature allows for seamless failover in logical replication environments. Following the steps outlined in this guide, you can create a high-availability cluster that ensures uninterrupted data flow, even during a primary server failure.
By optimizing configurations and leveraging PostgreSQL 17’s new capabilities, you can create a resilient and efficient database infrastructure for your mission-critical applications.
https://amitkapila16.blogspot.com/2024/10/failover-slots-in-postgresql-17.html