Introducing Snowflake Sequences in a Postgres Extension

In a PostgreSQL database, sequences provide a convenient way to generate a unique identifier, and are often used for key generation. From the community, PostgreSQL provides functions and SQL language to help manage sequence generation, but the sequences themselves are not without limitations in a multi-master environment. Snowflake sequences from pgEdge work seamlessly in a multi-master PostgreSQL cluster to remove those limitations so your data can thrive at the network edge.

Why are Sequences an Issue?

In a distributed multi-master database system, sequences can get complicated. Ensuring consistency and uniqueness across the nodes in your cluster is a problem if you use PostgreSQL sequences; the Snowflake extension steps up to automatically mitigate this issue.

PostgreSQL sequence values are prepared for assignment in a table in your PostgreSQL database; as each sequence value is used, the next sequence value nextval is incremented. Changes to the next available sequence value are not replicated to the other nodes in your replication cluster.

In a simple example, you might have a table on node n1, with 10 rows, each with a primary key that is assigned a sequence value from 1 to 10; the next prepared sequence value nextval on n1 will be 11. Rows are replicated from n1 to n2 without issue until you add a row on n2.

The PostgreSQL sequence value table on n2 has not been incrementing sequence values in step with the sequence value table on n1. When you add a row on n2, it will try to use the next available sequence value (nextval will be 1 if you haven't added a row on n2), and the INSERT will fail because a row with the primary key value of 1 already exists.

This disorder can be monitored and corrected by manually coordinating the PostgreSQL sequences between nodes in the cluster, but that quickly becomes complicated and potentially impacts the user experience as you add more nodes to the cluster.

Introducing Snowflake Sequences

An alternative to using PostgreSQL sequences is to use a guaranteed unique Snowflake sequence. Snowflake sequences are represented externally as bigint values. A Snowflake sequence is made up of:

The timestamp is a 41-bit unsigned value representing millisecond precision and an epoch of 2023-01-01.
A unique ID is allocated as a 12-bit unsigned value. This provides for 4096 unique IDs per millisecond, or 4 million IDs per second.
The node number is a 10-bit unique identifier of the PostgreSQL instance in a global cluster. This value must be set with the GUC snowflake.node in the postgresql.conf file.

This combination ensures that a unique identifier is always available; even the most aggressive allocation of sequences cannot exceed the current assignment capabilities. Should it be possible in the future to require more than 4096 Snowflakes per millisecond, the algorithm will bump the timestamp one millisecond into the future to keep Snowflake IDs unique.

You can use Snowflake functions from pgEdge to extrapolate data from a Snowflake sequence for auditing or for use in other transaction processing needs. Information about the node on which a transaction occurred, or the specific time a transaction occurred is built into your data and easily accessed when you use a Snowflake sequence.

Using Snowflake Sequences with pgEdge

pgEdge automatically installs and creates the snowflake extension when you install the pgEdge Platform. It is also automatically installed on all pgEdge Cloud Developer Edition databases.

The spock node-create command has been updated to set the Snowflake node id for you if you use the node naming convention n1, n2, n3 (up to n9). If you use another node naming convention, you will need to manually set snowflake.node to a unique value:

./pgedge spock node-create n2 'host=10.2.1.2 user=pgedge port=5432 dbname=demo' demo
node_create: 560818415
snowflake.node = 2

After creating all of your database objects, you can use a pgEdge Platform command to convert your existing sequences into snowflake sequences. When calling the spock sequence-convert function, pass the name of the sequence to be converted and the database name. Like other pgEdge Platform functions, you can include quoted wildcards to include all sequences in that database (‘*’) or to match a specific pattern like all sequences in a schema (‘public.*’).

./pgedge spock sequence-convert ‘public.*’ postgres

During the conversion, the data type of your sequence fields are changed to bigint; any existing values in your database remain the same. When you add new rows to your database, the new sequences will use the format of a snowflake sequence:

SELECT snowflake.format(id) FROM table1;
           id         |                          format
-------------------+-----------------------------------------------------------
 18014518154714241 | {"node": 1, "ts": "2023-10-16 18:47:12.257+00", "count": 0}
 18014518154714242 | {"node": 1, "ts": "2023-10-16 18:47:12.258+00", "count": 0}
(2 rows)

Getting Started

Snowflake sequences are just one benefit you get when using pgEdge Platform (self-hosted) or pgEdge Cloud (fully managed cloud service) to construct, manage, and host your distributed PostgreSQL databases. For more information about pgEdge, visit www.pgedge.com. To try it for yourself you can get started for free by signing up for pgEdge Cloud at https://www.pgedge.com/get-started/cloud , or download pgEdge Platform at https://www.pgedge.com/get-started/platform.