Introducing Snowflake Sequences in a Postgres Extension
Eliminating Conflicts and Enhancing Scalability in a Multi-master Distributed PostgreSQL Application
In a PostgreSQL database, sequences provide a convenient way to generate a unique identifier, and are often used for key generation. From the community, PostgreSQL provides functions and SQL language to help manage sequence generation, but the sequences themselves are not without limitations in a multi-master environment. Snowflake sequences from pgEdge work seamlessly in a multi-master PostgreSQL cluster to remove those limitations so your data can thrive at the network edge.
Why are Sequences an Issue?
In a distributed multi-master database system, sequences can get complicated. Ensuring consistency and uniqueness across the nodes in your cluster is a problem if you use PostgreSQL sequences; the Snowflake extension steps up to automatically mitigate this issue.
PostgreSQL sequence values are prepared for assignment in a table in your PostgreSQL database; as each sequence value is used, the next sequence value nextval
is incremented. Changes to the next available sequence value are not replicated to the other nodes in your replication cluster.
In a simple example, you might have a table on node n1
, with 10 rows, each with a primary key that is assigned a sequence value from 1 to 10; the next prepared sequence value nextval
on n1
will be 11. Rows are replicated from n1
to n2
without issue until you add a row on n2
.
The PostgreSQL sequence value table on n2
has not been incrementing sequence values in step with the sequence value table on n1
. When you add a row on n2
, it will try to use the next available sequence value (nextval
will be 1 if you haven't added a row on n2
), and the INSERT
will fail because a row with the primary key value of 1 already exists.
This disorder can be monitored and corrected by manually coordinating the PostgreSQL sequences between nodes in the cluster, but that quickly becomes complicated and potentially impacts the user experience as you add more nodes to the cluster.
Introducing Snowflake Sequences
An alternative to using PostgreSQL sequences is to use a guaranteed unique Snowflake sequence. Snowflake sequences are represented externally as bigint
values. A Snowflake sequence is made up of:
The timestamp is a 41-bit unsigned value representing millisecond precision and an epoch of 2023-01-01.
A unique ID is allocated as a 12-bit unsigned value. This provides for 4096 unique IDs per millisecond, or 4 million IDs per second.
The node number is a 10-bit unique identifier of the PostgreSQL instance in a global cluster. This value must be set with the GUC
snowflake.node
in thepostgresql.conf
file.
This combination ensures that a unique identifier is always available; even the most aggressive allocation of sequences cannot exceed the current assignment capabilities. Should it be possible in the future to require more than 4096 Snowflakes per millisecond, the algorithm will bump the timestamp one millisecond into the future to keep Snowflake IDs unique.
You can use Snowflake functions from pgEdge to extrapolate data from a Snowflake sequence for auditing or for use in other transaction processing needs. Information about the node on which a transaction occurred, or the specific time a transaction occurred is built into your data and easily accessed when you use a Snowflake sequence.
Using Snowflake Sequences with pgEdge
pgEdge automatically installs and creates the snowflake extension when you install the pgEdge Platform. It is also automatically installed on all pgEdge Cloud Developer Edition databases.
The spock node-create command has been updated to set the Snowflake node id for you if you use the node naming convention n1
, n2
, n3
(up to n9
). If you use another node naming convention, you will need to manually set snowflake.node to a unique value:
./pgedge spock node-create n2 'host=10.2.1.2 user=pgedge port=5432 dbname=demo' demo
node_create: 560818415
snowflake.node = 2
After creating all of your database objects, you can use a pgEdge Platform command to convert your existing sequences into snowflake sequences. When calling the spock sequence-convert function, pass the name of the sequence to be converted and the database name. Like other pgEdge Platform functions, you can include quoted wildcards to include all sequences in that database (‘*’) or to match a specific pattern like all sequences in a schema (‘public.*’).
./pgedge spock sequence-convert ‘public.*’ postgres
During the conversion, the data type of your sequence fields are changed to bigint; any existing values in your database remain the same. When you add new rows to your database, the new sequences will use the format of a snowflake sequence:
SELECT snowflake.format(id) FROM table1;
id | format
-------------------+-----------------------------------------------------------
18014518154714241 | {"node": 1, "ts": "2023-10-16 18:47:12.257+00", "count": 0}
18014518154714242 | {"node": 1, "ts": "2023-10-16 18:47:12.258+00", "count": 0}
(2 rows)
Getting Started
Snowflake sequences are just one benefit you get when using pgEdge Platform (self-hosted) or pgEdge Cloud (fully managed cloud service) to construct, manage, and host your distributed PostgreSQL databases. For more information about pgEdge, visit www.pgedge.com. To try it for yourself you can get started for free by signing up for pgEdge Cloud at https://www.pgedge.com/get-started/cloud , or download pgEdge Platform at https://www.pgedge.com/get-started/platform.
Additional Resources
Snowflake documentation is available at: https://docs.pgedge.com/platform/advanced/snowflake#snowflake-sequences
You can also visit pgEdge at Github for more information about pgEdge Platform at: https://github.com/pgEdge/pgedge
For information about the convenience provided by pgEdge Cloud Developers Edition, visit: https://www.pgedge.com/products/pgedge-cloud