PostgreSQL high availability

How pgEdge delivers high availability in PostgreSQL

Get started free

pgEdge_HA_graphic@4x

img554878545

Why is Postgres HA important?

  • Economic stakes - Even brief downtime can create considerable financial repercussions, especially in businesses with real-time data operations, like financial trading platforms, e-commerce sites, and digital banking services.

  • Cloud Uptime - Major cloud providers do have region wide outages, in some cases measured in hours. Having easy and immediate failover to another region endures that your application's availability is not tied to that of your cloud provider.

  • Brand confidence - For platforms used by consumers directly, downtime can dramatically effect user loyalty. If it happens repeatedly, the brand can be tarnished, and customers can go elsewhere.

  • Performance and scalability - By balancing incoming requests across several servers prevents any single server from becoming overwhelmed. As a result, overall performance is enhanced especially during high-traffic periods.

map_img596564

How pgEdge makes it easier to achieve high availability

pgEdge mitigates the most complex, costly, and failure-prone aspects of achieving multi-region PostgreSQL high availability:

  • Eliminates standby promotion complexity - Traditional PostgreSQL setups require manual or orchestrated promotion of standbys to primary when a region fails. pgEdge's native multi-master replication ensures that every node is always ready to accept traffic without promotion.

  • Simplifies application failover - Applications do not need to handle custom failover logic or track which region is currently the primary. They simply connect to the nearest available node, reducing development effort and potential sources of failure.

  • Enables near zero downtime maintenance - Routine upgrades, patches, and node replacements can be performed without impacting application availability, even across regions.

  • Reduces operational overhead - pgEdge Distributed PostgreSQL minimizes the need for custom scripts, cloud-specific HA tooling, and complex failover orchestration across regions.

pgEdge is available as a free 30-day Cloud trial or as a self-hosted platform download.

pgedge_img44454545

What pgEdge doesn't eliminate (and why that's okay)

pgEdge does not remove need for:

  • A networking layer to re-route traffic when a node becomes unreachable (e.g., via load balancer, DNS, or service mesh).

  • Intra-region standby nodes (typically in cloud availability zones), which remain necessary for physical redundancy and fast local failover (handled by Patroni + etcd).

  • Observability and Monitoring, which remains essential for diagnosing and managing distributed systems.

This is by design. These elements are common to all high-availability systems and are often tightly integrated with the deployment environment (cloud-native load balancers, private DNS, etc.). pgEdge focuses on removing the database-level barriers to geo-distributed high availability, not the fundamental need for traffic routing.

By solving the hard part - safe, consistent, and automatic multi-region database replication - pgEdge allows organizations to achieve enterprise-grade PostgreSQL availability with:

  • Lower engineering effort

  • Reduced downtime risk

  • Simplified operations

  • Less custom tooling

Your networking team handles traffic routing, just as they would in any HA setup, but without the added burden of managing database failover complexity.

img47755

Table above shows an example of common targets for uptime levels, and the maximum amount of allowable downtime*.

Measuring Postgres high availability with RPO, RTO, and multiple nines.

Organizations requiring true high availability define their high availability requirements in terms of uptime expectations, expressed in percentage terms (e.g. 99.99%) and also targets for their RTO (Recovery Time Objective) and RPO (Recovery Point Objective).

Recovery Time Objective (RTOis the maximum amount of time the system can be down before normal business operations can resume, and is typically expressed in seconds or minutes.

Recovery Point Objective (RPO) is the maximum acceptable interval during which transactional data can be lost by the system, and is also typically expressed in seconds or minutes.

It is worth noting that while uptime percentage is typically calculated on an annual basis, some Service Level Agreements (SLAs) offered by some vendors will be based on a monthly calculation.

img47845 (1)

Achieving high availability with pgEdge Distributed PostgreSQL

Combining pgEdge’s multi-master asynchronous logical replication with standard Postgres synchronous read replication is a powerful and flexible way of addressing high availability, especially Recovery Time Objective (RTO) and Recovery Point Objective (RPO) targets.

When deployed in a conforming architecture, a typical pgEdge PostgreSQL HA cluster is able to achieve four nines of availability on an annual basis. For a stable cluster with no planned maintenance activities (e.g. software upgrades or addition of new nodes) it is even possible to achieve five nines of availability, or four nines on a monthly basis.

Outside of planned maintenance, pgEdge can be used to achieve zero RTO and zero RPO.

Some might say that an RTO of zero is impossible, since no equipment or software can detect an outage in a zero increment of time (e.g. it might take a load balancer 30 seconds to determine a node is no longer reachable). At pgEdge, we consider any time period less than 1 minute to be zero.

img48548948

Achieving PostgreSQL cluster high-availability

Achieving high availability in PostgreSQL is like building a fortress with multiple layers of protection. Such layers offer fail-safe mechanisms to ensure the continuity and resilience of the database operations.

  • Using zonal architecture for a high availability cluster - A pgEdge cluster is structured across multiple zones. As you can see, each of these zones operates independently yet remains interconnected to the other clusters which forms a web of data exchange and backup. This zonal architecture is essential for Postgres load balancing, thus adding another layer of fault tolerance. If one zone goes down, your operation can be swiftly redirected to another zone, thus greatly minimizing the risk of service disruption.

  • Multi-master replication with pgEdge’s Spock Extension - Zones are not isolated entities. They communicate and replicate data amongst themselves using multi-master replication, facilitated by pgEdge Spock (similar to EDB’s pgLogical, only it is fully open and fully standard). Since every zone is always up-to-date, any zone is ready to take over if another one fails.

  • Using etcd and Patroni for intra-zonal redundancy - Within each zone, there are two additional synchronous replicas, orchestrated by etcd and managed by Patroni. Therefore, even if one pgEdge node encounters an issue, there's an immediate backup within the same zone ready to take over.

It’s an orchestrated dance of redundancy and replication, ensuring that your data remains accessible and intact, even when faced with multiple points of failure.

Summary: Key take-aways

  • 1

    High Availability (HA) is a vital aspect of PostgreSQL database management for mission-critical applications, ensuring uninterrupted database services even in the face of server failure or maintenance activities.

  • 2

    A well-designed HA system also integrates monitoring and detection tools that trigger failover procedures upon detecting primary server issues.

  • 3

    Using read replicas and controlled switchover for planned maintenance contributes to improved query performance and data reliability.

  • 4

    pgEdge mitigates the most complex, costly, and failure-prone aspects of achieving multi-region PostgreSQL high availability

  • 5

    Using pgEdge’s multi-master replication, zones are not isolated entities. They communicate and replicate data amongst themselves, facilitated by pgEdge Spock (similar to EDB’s pgLogical, only it is fully open and fully standard). Since every zone is always up-to-date, any zone is ready to take over if another one fails.

Your Choice

Self-hosted or Cloud deployment
pgEdge Cloud
  • Fully managed Database-as-a-Service (DBaaS)

  • Handles provisioning, security and monitoring

  • Access via web dashboard, CLI and API

  • Multi-cloud support available for AWS, Azure and Google Cloud

Get Started for Free
pgEdge Platform
  • All features of pgEdge Distributed PostgreSQL

  • Self-host on-premises or in cloud accounts (AWS, Azure, GCP, Equinix Metal, Akamai Linode)

  • For developer evaluations or production usage

  • Enterprise support available

Need additional information about PostgreSQL high availability?

Check out these resources:

Dive deeper into pgEdge

dive-img

How to Unleash Ultra High Availability and Zero Downtime Maintenance with Distributed PostgreSQL

dive-img

How Multi-Master Distributed Postgres Solves High Availability and Low Latency Challenges

dive-img

PostgreSQL 17 - A Major Step Forward in Performance, Logical Replication and More

Get started today.

Experience the magic of pgEdge Distributed PostgreSQL now.