Different methods to generate unique token ids (2024)

3 min read

Dec 25, 2022

(This post is just a note to improve my knowledge of #systemdesign).

In any system that we design, we may need to identify each object uniquely. Very often, this token generator module itself can become a performance and concurrency bottleneck.

Different methods to generate unique token ids (2)

Will be outlining all the methods that I come across for unique token generation in this post from simple to hard and complicated. As always, I expect to update this page from time as whenever I find new ways to generate unique ids.

use a monotonically increasing counter and assign whenever a new request for a unique resource id is made
+ simple to use
— additional mechanism needed to use concurrently; counterspace needs to be partitioned to be used in parallel
— last assigned value needs to be persisted in stable storage
use hashing function like MD5|SHA1|SHA2 or equivalent to generate a unique hash function.
+ easy to use and low probability of collisions
— MD5 (128bit|16bytes) SHA1–1(160bit|40bytes),SHA2(28..64bytes) still too long to uniquely identify a resource in certain usecases viz. URL shortener|pastebin
— computation of the cryptographic hash needs CPU cycles
— id tied to URL, cannot generate more than 1 shorturl or custom id to identify resource
— cannot be precomputed|batch computed
use UUID / GUID
+ no centralized authority is required to administer them
+ can be independently generated without any fear of collisions as they are generated unique to systems MAC id.
+ do not depend on input so can be batch generated and preallocated.
— UUID(128bits|32bytes) too long for certain usecases viz bitly|pastebin
Use PRIMARY key in DB and AUTOINCREMENT in MySQL
* MongoDB provides ObjectID, MySQL|MariaDB provide AUTO_INCREMENT, MS SQL Server provides IDENTITY
* Is a form of Persistence Layer Generated IDs
— generated ID is not known to your code without a roundtrip to the database
— extra roundtrip to the RDBMS may slow down your application
SnowflakeID aka TwitterID
* Snowflake ID is a form of unique identifier (64bits|8bytes) used in distributed computing. * Form of ID server|s. Implementation can be a single server creating Ids or a cluster of servers creating high numbers of IDs per second.
* created by Twitter and is used for the IDs of tweets, adopted by other companies, including Discord and Instagram
* Instagram uses a modified version( 41 bits for a timestamp, 13 bits for a shard ID, and 10 bits for a sequence number.)
* Twitter was looking for a minimum of 10000 IDs per sec per process with a response rate of <2ms.
* ID servers should require no network coordination at all between them and generated Ids should be, roughly, time-ordered.
* Format: Time(41 bits timestamp with ms precision)+ 10 bits(machine ID)
+ 12 bits(per-machine sequence number, rolls every 4096 per machine), to allow creation of multiple snowflakes in the same ms.
* Final number is generally serialized in decimal
* Sortable by time, because they are based on the time they were created
* Time a snowflake was created can be calculated from the snowflake.
* More on it at https://en.wikipedia.org/wiki/Snowflake_ID
ULID: Unique Lexicographically Sorted Individual Identifiers
* Relatively new mechanism used by Shopify, 50% faster then UUID generation
* ULIDs have a time component and sortable
* ULIDS have a compact string representation
* https://github.com/ulid/spec

1.21e+24 unique ULIDs per millisecond
Lexicographically sortable!
Canonically encoded as a 26 character string, as opposed to the 36 character UUID
Case insensitive
Uses Crockford’s base32 for better efficiency and readability (5 bits per character) incl. all 0–9 digits but alphabet excludes the letters I, L, O, and U to avoid confusion and abuse.
No special characters (URL safe)
Monotonic sort order (correctly detects and handles the same millisecond)
* https://www.youtube.com/watch?v=f53-Iw_5ucA

> ttttttttttrrrrrrrrrrrrrrrr where t is Timestamp (10 characters) r is Randomness (16 characters)

https://code.flickr.net/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/
- https://ebrary.net/64743/computer_science/partitioning_hash
- https://dba.stackexchange.com/questions/166843/what-index-to-use-with-lots-of-duplicate-values
- https://code.flickr.net/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/
- https://betterprogramming.pub/uuid-generation-snowflake-identifiers-unique-2aed8b1771bc?gi=605a222fa212
- https://blog.bytebytego.com/p/ep36-types-of-databases-and-use-cases
- https://towardsdatascience.com/choosing-the-right-database-in-a-system-design-interview-b8af9c6dc525?gi=0379cabc721e
- https://medium.com/timesopen/collective-decision-making-with-ahp-3ef819e5bc2a
- http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
- https://firebase.googleblog.com/2015/02/the-2120-ways-to-ensure-unique_68.html
- https://levelup.gitconnected.com/how-to-generate-unique-ids-in-distributed-systems-6-key-strategies-37a8ab3b367d