Design A Unique ID Generator In Distributed Systems

zeeshanali0704

ZeeshanAli-0704

Posted on August 20, 2024

Design A Unique ID Generator In Distributed Systems

Table of Contents

  1. Problem Overview
  2. Functional Requirements (FRs)
  3. Non Functional Requirements (NFRs)
  4. Detailed Design Options
  5. Conclusion
  6. More Details

Problem Overview:

You are tasked with designing a unique ID generator that works efficiently in a distributed system. Traditional database solutions, like using a primary key with an auto_increment attribute, aren't suitable for distributed environments due to scalability challenges and the difficulty of synchronizing IDs across multiple databases with minimal delay.

Functional Requirements (FRs):

  1. Uniqueness:

    • Requirement: Each ID generated by the system must be unique across the entire distributed environment.
    • Example: If two different servers in the system generate IDs simultaneously, they must produce different IDs to avoid conflicts. This ensures that each entity in the system (e.g., users, orders, transactions) can be uniquely identified.
  2. Numerical Values:

    • Requirement: IDs should consist solely of numerical values.
    • Example: An ID might look like 163717284912345, which is purely a numerical string. This is important for compatibility with systems that require numeric IDs for indexing or sorting.
  3. 64 Bit Length:

    • Requirement: Each ID should fit into a 64-bit integer, which can store values up to approximately 9.22 × 10^18.
    • Example: A 64-bit ID might be 123456789012345678, ensuring it meets storage efficiency needs and fits within typical database constraints for integer fields.
  4. Ordered by Date:

    • Requirement: IDs should reflect the order of creation based on time, meaning newer IDs are larger than older ones.
    • Example: An ID generated at 9 AM might be 163717284912345, and another ID generated at 5 PM on the same day might be 163717584912345. This allows sorting by time to be straightforward, facilitating easy retrieval of recent records.
  5. High Throughput:

    • Requirement: The system must generate a large number of unique IDs efficiently, specifically at least 10,000 IDs per second.
    • Example: If the system is handling a high volume of transactions, such as an e-commerce platform during a sale, it must be capable of issuing over 10,000 unique transaction IDs per second without bottlenecks.

Non Functional Requirements (NFRs):

  1. Scalability:

    • Requirement: The ID generation system should scale effectively as the number of servers or data centers increases.
    • Example: As more servers are added to handle increased traffic, the system should maintain consistent ID generation performance and ensure no duplicate IDs are created across different servers.
  2. Low Latency:

    • Requirement: ID generation should be fast, with minimal delay to avoid becoming a bottleneck in the system.
    • Example: When a user places an order on an e-commerce site, the system should generate the order ID almost instantly, ensuring the user's experience is seamless.
  3. Reliability:

    • Requirement: The system should be reliable, with minimal risk of failure, even in the case of hardware or network issues.
    • Example: If a server goes down, the system should still be able to generate unique IDs without interruption, possibly by redistributing the ID generation load to other servers.
  4. Decentralization:

    • Requirement: The system should not rely on a central authority or server to generate IDs, reducing the risk of a single point of failure.
    • Example: Each server in a data center should be capable of generating IDs independently, so if one server fails, others can continue generating IDs without waiting for the failed server to recover.
  5. Ease of Implementation:

    • Requirement: The design should be straightforward to implement and maintain, with clear procedures for handling system upgrades or scaling.
    • Example: If the system needs to be scaled to more servers, the implementation should require minimal changes to the codebase or architecture, making it easy for engineers to deploy and maintain.

Detailed Design Options:

Multi Master Replication

  • Description: This method uses multiple database servers with the auto_increment feature, but instead of incrementing by 1, it increments by a value k, where k is the number of database servers.
  • FRs Met: Uniqueness, Numerical Values.
  • NFRs Challenges: Scalability issues arise when servers are added or removed, and IDs may not always increase with time.

Universally Unique Identifier (UUID)

  • Description: A UUID is a 128-bit value generated independently by each server. It's simple and scalable but doesn't meet the 64-bit or numerical requirements.
  • FRs Met: Uniqueness.
  • NFRs Challenges: Doesn't fit into 64 bits, IDs are not time-ordered, and may not be purely numerical.

Ticket Server

  • Description: A centralized server generates IDs using auto_increment. It's simple to implement but creates a single point of failure.
  • FRs Met: Uniqueness, Numerical Values, Ordered by Date.
  • NFRs Challenges: Scalability and reliability issues due to the centralized design.

Twitter Snowflake
The Twitter Snowflake approach refers to a unique ID generation system that Twitter developed to create scalable, distributed, and unique IDs across their infrastructure. The Snowflake ID system is particularly useful for distributed databases and systems where it is essential to generate unique identifiers without a central coordination point. Here’s an overview of how it works:

Key Concepts:
  1. 64 bit ID: Snowflake generates 64-bit IDs, which are large enough to store a lot of unique identifiers.

  2. Structure of Snowflake ID:

    • Timestamp (41 bits): Represents the number of milliseconds since a custom epoch. This part of the ID ensures that IDs are generally ordered by time.
    • Data Center ID (5 bits): Identifies the data center where the ID was generated, allowing for geographical distribution.
    • Machine ID (5 bits): Identifies the specific machine within the data center where the ID was generated.
    • Sequence Number (12 bits): A local counter per machine that allows the generation of multiple IDs in the same millisecond.
  3. Decentralization: The Snowflake ID generation is decentralized. Each machine in Twitter's infrastructure can generate IDs independently without requiring coordination with other machines. This independence enables the system to scale massively and reduce latency.

  4. Uniqueness and Order: The IDs are unique across the system, and because the timestamp is part of the ID, they are generally (though not strictly) ordered by creation time.

Example of Snowflake ID Breakdown:

For a Snowflake ID like 123456789012345678, you could break it down as follows:

  • Timestamp: 41 bits representing the time in milliseconds since the epoch.
  • Data Center ID: 5 bits for the data center.
  • Machine ID: 5 bits for the specific machine.
  • Sequence Number: 12 bits for the sequence number on that machine at that millisecond.
Advantages:
  • Scalability: The system scales well across multiple data centers and machines.
  • High Throughput: Can generate thousands of unique IDs per second per machine.
  • Decentralized: No need for a central coordination system, reducing the risk of bottlenecks.
Use Cases:
  • Distributed Systems: Suitable for generating IDs in distributed systems where consistency and uniqueness are crucial.
  • Database Primary Keys: Used as primary keys in databases, particularly in systems that require sharding or partitioning.

The Snowflake approach is widely recognized for its efficiency and has inspired similar ID generation systems in other large-scale distributed environments.

Conclusion:

The Twitter Snowflake approach is the most suitable solution for generating unique IDs in a distributed system, as it meets all the functional and non-functional requirements. It offers a balance of scalability, low latency, and reliability, making it ideal for systems that need to generate a large volume of unique IDs quickly and efficiently.

More Details:

Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli

systemdesignwithzeeshanali

Git: https://github.com/ZeeshanAli-0704/SystemDesignWithZeeshanAli

💖 💪 🙅 🚩
zeeshanali0704
ZeeshanAli-0704

Posted on August 20, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Database Replication in System Design
systemdesignwithzeeshanali Database Replication in System Design

November 3, 2024

Apache Zookeeper
systemdesignwithzeeshanali Apache Zookeeper

October 2, 2024

Redis and its role in System Design
systemdesignwithzeeshanali Redis and its role in System Design

October 2, 2024

Cache Eviction Policies | System Design
systemdesignwithzeeshanali Cache Eviction Policies | System Design

October 31, 2024

RPC vs REST: A Comprehensive Comparison
systemdesignwithzeeshanali RPC vs REST: A Comprehensive Comparison

September 25, 2024