UUID or ULID: Awesomeness of Unique Identifiers!
jiisanda🙆♂️
Posted on March 19, 2024
Welcome! In this article we are going to have the showdown of two prominent choices of unique identifiers: UUID and ULID! 😲
In the landscape of software development, the task of generating unique identifiers has always been a crucial challenge. Whether it's managing database keys, tracking events in distributed systems, or ensuring session security, the choice of identifier can significantly impact the efficiency and performance of your application. In this showdown both parties present their strengths, factors that could make you choose one over the other, a glimpse of there implementation, and weaknesses. So sit back, get some popcorns 🍿, and get ready for a showdown that will empower you to make an informed decision for your next project.
Stage Setup
In software development, unique identifiers play a crucial role in ensuring data integrity, system scalability, and security. They act as unique markers for various entities within a system, like database records, distributed events, and user sessions.
Traditional auto-incrementing IDs, while simple, can become problematic at scale, leading to performance issues, collision risks, and data leakage.
if need explanation for any of the problems do comment happy to answer...
So, the two powerful alternatives in the world of unique identifiers: UUIDs and ULIDs!
UUIDs are the OGs of uniqueness, with the standardized format and 128-bit punch, these chads says, "I'm globally unique, baby!" 😎. As they boast their universal uniqueness across the cosmos💫!
But then comes ULIDs, the new kids who brings the whole new steps in the game! UILDs ain't just uniqueness, but also lexicographically sorted, baby! With the blend of timestamp sweetness and randomness, ULIDs slide into your codebase like smooth saxophone 🎷 on a summer night.
So what's the fuss, you ask? Well, UUIDs bring the tried-and-true reliability, perfect for when you need global uniqueness. But ULIDs? They are all about time based sorting, making them the go to identifiers where chronological order is the thing.
UUIDs (Universally Unique Identifiers)
UUIDs are different from sequential ids. RFC_4122 says,
UUIDs are of a fixed size (128 bits) which is reasonably small compared to other alternatives. This lends itself well to sorting, ordering, and hashing of all sorts, storing in databases, simple allocation, and ease of programming in general.
Layout and Byte order of UUID
timestamp
- 60 bit value
- UUID1: represented by UTC (Since, 00:00:00.00 15th Oct 1582)
- UUID3 and 5: timestamp (50 bits) is constructed from name (See Algo below)
- UUID4: timestamp is randomly or pseudo-randomly generated. (see algo)
clock-sequence (14 bit)
- UUID1: clock sequence is used to help avoid duplicates that arises when clock is set backwards in time (or) if the node ID changes.
- UUID 3 and 5: 14-bit value constructed from a name (see Algo below)
- UUID4: randomly or pseudo-randomly generated. (see algo)
node (48-bit)
- UUID1: node field is an IEEE MAC address, usually host address.
- UUID 3 and 5: Constructed from name (see algo)
- UUID4: randomly or pseudo-randomly generated (see algo)
RFC_4122: Basic Algorithm sates an algorithm for generating UUIDs if they do not need to be generated frequently, but there were some issues. And hence different versions of UUIDs was implemented.
Let's look briefly at each and peek at it's Python implementation...
UUID1 (MAC Address + timestamp)
UUID1 concatenates the 48-bit MAC Address of the "node" (computer generating the UUID), with a 60-bit timestamp. The Python implementation is as follows:
def uuid1(node=None, clock_seq=None) -> UUID:
"""Generate a UUID from a host ID, sequence number, and the current time.
If 'node' is not given, getnode() is used to obtain the hardware
address. If 'clock_seq' is given, it is used as the sequence number;
otherwise a random 14-bit sequence number is chosen."""
# some code
time_low = timestamp & 0xffffffff
time_mid = (timestamp >> 32) & 0xffff
time_hi_version = (timestamp >> 48) & 0xfff
clock_seq_low = clock_seq & 0xff
clock_seq_hi_variant = (clock_seq >> 8) & 0x3f
returns UUID(fields = (time_low, time_mid, time_hi_version,
clock_seq_hi_variant, clock_seq_low, node), version=1)
UUID 3 and 5
The version 3 and 5 are name-based UUIDs. For example, some name spaces are domain name system, URLs, or reserved words in programming languages. Some potential python specific name space ids are as follows:
NAMESPACE_DNS = UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_URL = UUID('6ba7b811-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_OID = UUID('6ba7b812-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_X500 = UUID('6ba7b814-9dad-11d1-80b4-00c04fd430c8')
def uuid3(namespace, name):
"""Generate a UUID from the MD5 hash of a namespace UUID and a name."""
# some code
digest = md5(
namespace.bytes + name,
usedforsecurity=False
).digest()
return UUID(bytes=digest[:16], version=3)
def uuid5(namespace, name):
"""Generate a UUID from the SHA-1 hash of a namespace UUID and a name."""
# some code
hash = sha1(namespace.bytes + name).digest()
return UUID(bytes=hash[:16], version=5)
UUID4
The version 4 is meant for generating UUIDs from truly-random or pseudo-random numbers
def uuid4():
"""Generate a random UUID."""
return UUID(bytes=os.urandom(16), version=4)
Now if none of bytes, fields is given then class UUID()
will generate a TypeError
saying one of the hex, bytes, fields, or int argument must be given
.
This was all about UUIDs, if want to know more about UUID do look at RFC 4122...
ULIDs (Universally Unique Lexicographically Sortable Identifiers)
A ULID is 128 bit compatible with UUIDs, we can be generate 1.21e+24 unique ULIDs per second. These are as the name suggest lexicographically sortable. ULIDs are case sensitive, and no special character so URL safe.
Layout
In general the structure of a ULID is as follow
01AN4Z07BY 79KA1307SR9X4MV3
|----------| |----------------|
Timestamp Randomness
48bits 80bits
timestamp
- 48 bit integer
- UNIX-time in milliseconds
- Won't run out of space 'til the year 10889 AD.
randomness
- 80 bits
- Cryptographically secure source of randomness, if possible.
Sorting and Encoding and Montonocity
The left-most character must be sorted first, and the right-most character sorted last (lexical order). The default ASCII character set is used. For encoding Crockford's Base32 is used as shown below. This alphabet excludes I, L, O and U to avoid confusion and abuse.
0123456789ABCDEFGHJKMNPQRSTVWXYZ
While generating a UUID within same millisecond, it can provide some guarantees regarding some order. So if same millisecond is detected, the random
component is incremented by 1 bit in the least significant bit position.
Usage
You usually would create a new :class:ULID
object by calling the default constructor with no argument. In that case it will fill the timestamp part with the current datetime. And to encode the object it is usually converted to string.
You can create ULIDs, using different property passing as arguments. It can be generated using timestamp, or from uuid, from hex or byte, from string, or from datetime.
Advantages of ULID over UUIDs
- shorter string representation (26 characters in ULIDs vs 36 in UUIDs)
- Sortability for efficient ordering and retrieval.
- Potential performance benefits in certain scenarios like
- In databases that uses the sorted indexes, ULIDs can potentially improve query performance because they leverage the existing sorting order of the index.
- When working with time-series data, ULIDs (which often includes a timestamp component) can be stored and retrieved in chronological order without additional sorting.
Choosing the Right Champion
Now that we've explored both UUIDs and ULIDs, let's help you pick the champion for your next project!
Here's is a quick comparison:
Feature | UUIDs | ULIDs |
---|---|---|
Uniqueness | Guaranteed | Guaranteed |
Sortability | No | Yes (Lexicographically) |
String Length | 36 character | 26 character |
Performance | Generally Good | Potentially better with sorted indexes/time-series data |
When to choose ULIDs
- Sortability is essential: ULIDs excel when you need to efficiently sort or filter your identifiers.
- Performance optimization matters: In scenarios with sorted indexes or time-series data, ULIDs can potentially offer performance benefits.
- Compactness is desired: The shorter string length of ULIDs can be a space-saving advantage.
When to use UUIDs
- Focus on guaranteed uniqueness: If the absolute certainty of no collision is paramount, UUIDs are the established choice.
- Sorting isn't a priority: If order doesn't matter for your identifiers, UUIDs function perfectly well.
Ultimately, the best choice depends on your specific project requirements. Weigh the importance of uniqueness, sortability, performance, and string length to make an informed decision.
Example
Let's a look over how can you generate UUID and ULID in python...
# Generate a ULID
from ulid import ULID
ulid = ULID().generate()
print(f"ULID: {ulid}") # Example output: 01HQCK8PK2T23Q13VVS03K47F9E
# Generate a UUID (version 4 - random)
import uuid
uuid = uuid.uuid4()
print(f"UUID: {uuid}") # Example output: 123e4567-e89b-12d3-a456-426614174000
Conclusion
This article has explored the strength and weaknesses of two potential contenders in the unique identifier arena: UUIDs and ULIDs.
Key Takeaways
- Both UUIDs and ULIDs guarantee uniqueness, a crucial aspect for data integrity and security.
- UUIDs reigns supreme when prioritizing absolute uniqueness and don't require sorting capabilities.
- ULIDs shine when sortability and potentially improved performance are key consideration, thanks to their lexicographically sorting and timestamp component.
- Their compact string representation (26 characters) offers a space-saving advantage compared to UUIDs (36 characters)
Posted on March 19, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.