pgEdge's unique multi-master (active-active) replication runs across geographic regions, reduces latency, and ensures ultra-high availability.  pgEdge Platform v24.07, Constellation Release includes exception handling enhancements developed in collaboration with customers, partners, and the community.

Logical Replication Exception Handling and Logging

A new exception log table tracks replication errors to prevent them from blocking subsequent replication changes, ensuring a more robust and user-friendly experience when managing replication. Graceful error handling means that replication errors in unexpected situations are appropriately managed without interrupting overall system operation.

pgEdge Platform clusters running Spock 4.0 and later are (by default) configured with exception logging enabled. In this configuration, if any operation within a transaction block returns an ERROR, Spock writes all of the operations from the block to the exception log table without interruption to future transactions. Your cluster maintains active-active uninterrupted replication, allowing you to address the ERROR during a maintenance window, or at a time that suits your schedule.

You query the exception_log table at the PSQL command line; the table entry includes the information you need to decide how to remediate the ERROR; the time the error occurred, the error message, and the data values from the tuple in conflict:

demo=# SELECT * FROM spock.exception_log;
-[ RECORD 1]----+------------------------------------------
remote_origin	| 26863
remote_commit_ts   | 2024-07-02 19:10:04.317346+00
command_counter    | 1
remote_xid   	| 756
local_origin 	|
local_commit_ts    |
table_schema 	| public
table_name   	| acctg
operation    	| UPDATE
local_tup    	|
remote_old_tup     |
remote_new_tup     | [{"value": 3, "attname": "a", "atttype": "int4"}, {"value": 2, "attname": "b", "atttype": "int4"}]
ddl_statement	|
ddl_user     	|
error_message	| logical replication did not find row to be updated in replication target relation (public.acctg)
retry_errored_at   | 2024-07-02 19:10:05.058789+00

If you find that the level of logging or transaction retention is more than you need, you can modify the default logging behavior on a granular level, fine-tuning the exception log table entries to best suit your needs.

Replication Repair Mode

New functionality allows you to enable and disable repair mode, allowing you to opt-out of replicating changes made on a specific database node. This feature provides increased control over logical replication in a distributed PostgreSQL cluster while resolving errors or modifying the state of a database node. It also supports error remediation with external tools without affecting the entire cluster.

When it's time to fix a transaction, you can use repair mode to simplify the process. Repair mode works inside of a transaction block. While your node is in repair mode, replication between the node you're updating and other nodes in your cluster is halted without interrupting replication of other users' transactions. When your transaction block is committed or rolled back, the node is seamlessly returned to the cluster, and replicates the data modifications to your other nodes. The ability to easily exit and rejoin the cluster makes repair mode a powerful tool that can either be used to make data updates or to perform parallel operations on all nodes while taking the pressure off replication.

Using repair mode is simple  

To use repair mode, you'll make your cluster updates inside of a transaction block. After you BEGIN the transaction block, set spock.repair_mode to true to prevent the transaction from immediately replicating to other nodes:

demo=# BEGIN;
BEGIN
demo=*# SELECT spock.repair_mode('true');
-[ RECORD 1 ]----------
repair_mode | 0/1C5AFD0

demo=*# INSERT INTO employeeid VALUES (`3`,`Smith`,`sales`);
INSERT 0 1
demo=*# COMMIT;

When you end the transaction block (with either a COMMIT or ROLLBACK statement), spock.repair_mode is returned to false automatically and the node rejoins the cluster; future transactions from that user session are replicated to other nodes.

Conclusion

pgEdge remains the only fully distributed PostgreSQL solution that is open (source available) and completely based on the standard PostgreSQL database.  pgEdge Platform regularly updates functionality with features taken from the Spock extension (version 4.0) to help you preserve data integrity while keeping your pgEdge multi-master logical replication cluster robust and active-active. pgEdge also improves the tooling so you can more easily manage your pgEdge Platform or pgEdge Cloud clusters.

For more information about pgEdge tooling or to sign up for a cluster, visit the pgEdge website at: www.pgedge.com.