CAP Theorem : Scenarios explained
Prasanna Sridharan
Posted on October 2, 2021
CAP theorem applies to behavior exhibited by the distributed system with respect to three attributes Consistency, Availability and Partition tolerance, for definition refer to the wiki.
The theorem defines the choice of behavior that the system can exhibit from the end user's perspective. It states that, at any given point of time, only 2 of the 3 behaviors can be guaranteed.
Lets look at what each of the 3 behavior individually mean:
- Consistency: The system shall always respond to the end user's read request with consistent data. See note.
- Availability: The system shall always respond to a user's request to read/write.
- Partition Tolerance: The system will function even if the communication between the nodes drops i.e. partitions are created within the network. If this behavior is acceptable, then the system is tolerant to various partitions not interacting with each other and the system can continue to function with some sacrifices in quality of service.
For the below scenarios, let us consider that there are just two nodes which can hold some data. Node A is responsible for both read and write, whereas Node B is designed to be only read from.
Lets take a look at all the 3 combinations that are possible, lets start with the simplest.
Combination 1: Consistency & Availability.
In this combination, it is expected that the system is both consistent and available. It also means the system is resilient to inter node communications; and any partitions in the network is not tolerated.
When the writes happen to the Node A, then all the reads are suppose to get back the latest data consistently bounded by definition of consistency. Also the system is always available, all the read/write requests are honored without errors.
Combination 2 & 3:
These combinations are possible when the system is expected to function even when network partitions occur and system can tolerated it. I.e. even if the nodes are unable to communicate system will function.
Combination 2: Consistency and Partition tolerance:
When there are 2 or more partitions in the system, then the system can be designed to make one of the cluster/partition as the primary and have the other nodes/partitions dormant.
The primary cluster/partition is responsible and active for any data read and writes. When requests for reads are serviced by this primary node, the data is guaranteed to be latest and consistent. However if the read/write requests happen to go to nodes that are in dormant state, the system may not respond, in other words availability is not guaranteed.
When network is partitioned, then in order to keep the system consistent, partition 1 is made primary and the others are made dormant. So any request for read from system will get consistent data (if served by Node 1) or no data at all (if served by Node 2).
Combination 3: Availability and Partition tolerance:
When there are 2 or more partitions in the system, and the system can be designed to function as it was earlier. I.e. the nodes that were responsible for read and write will continue to do that; the nodes that were used for reads will continue to respond to end user's read requests. So end user will always perceive the system to be available, however data reads highly depend on the node to which the read requests goes. As there is no internode communication, the data may be out of sync, and end user may perceive the system as inconsistent.
When network is partitioned, in order to keep the system available, any read request is serviced irrespective of whether the data is stale or not. So depending on the node that services the request the data may not be perceived consistent.
Note on consistency:
The definition for consistency still depends on the implementation that was designed in the system. E.g. Eventual, Session, Strong. For purpose of above discussion, if we say system is consistent, then it is bounded by the design of consistency in the system.
Note on post communication restoration in case of partition tolerance:
This post just discusses the system's response as soon as nodes are unable to communicate with each other. Once communication between the nodes/partitions is restored, what techniques are used to synchronize the nodes is outside the purview of this article; E.g. Consensus algorithm.
Posted on October 2, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024