Cloudflare has officially launched its D1 database!

smartfool

SmartFool

Posted on May 23, 2024

Cloudflare has officially launched its D1 database!

Cloudflare D1 is officially released!

The article mentions some knowledge of concurrent database access, which some of you may not quite understand, so I'll paraphrase below.

First of all, the concept of transaction, a simple understanding is that several operations are bound together, either at the same time successful, or at the same time failed to take the transfer of money, for example, the transfer of money is simply divided into your card deducted and the other card to add money to the two processes, these two processes must ensure that the same time successful or failed at the same time, otherwise there will be a problem, these two operations are bound together called "transaction", they can only be together with the success or failure if the transaction is successful, it is called "commit success" if the transaction fails, it is called "rollback". If the transaction succeeds, it is called "commit success", if the transaction fails, it is called "rollback".

The article mentions three consistency models: Read Committed, Snapshot Isolation, and Serializable. In order to explain these three models, I need to first explain a few concepts: dirty reads / dirty writes / read skew / update loss / write skew

  • Dirty Reading

Simply put, if a transaction "reads" data that has been modified by another "uncommitted transaction", it has two effects. 1:

  1. transaction needs to update multiple objects, dirty reading means that another transaction sees part of the update, which may cause confusion for the user and lead to some unnecessary subsequent operations
  2. if the transaction is aborted, all writes are rolled back, and a dirty read means that he may see data that will be rolled back that was not actually committed.

Suppose there are two bank accounts: Alice's account and Bob's account, Alice's account balance of 1000, Bob's account balance of 500, the process of dirty reads occur:

  1. Start Transaction 1: The banking system starts a transaction (Transaction 1) in which Alice wants to transfer 300 to Bob. 2.
  2. Update Alice's account: In Transaction 1, the system deducts 300 from Alice's account balance, at which time Alice's account balance becomes 700 (but Transaction 1 has not been submitted). 3.
  3. A dirty read occurs: At this point, another banking system operator or customer service system starts a new transaction (Transaction 2), in which it reads Bob's account balance, and because of the dirty read, it reads 800 (which is an update that was not committed in Transaction 1).
  4. Decisions based on dirty reads: Based on the 800 balance read, the banking system or operator may make decisions, such as allowing Bob to make another transaction or loan request, assuming that her account balance is sufficient to cover it.
  5. Transaction 1 aborts: For some reason, Alice's transfer operation goes awry. Transaction 1 is aborted, all changes (i.e., the 300 deduction) are rolled back, and Alice's account balance is restored to 1,000.
  6. Wrong decision: Because transaction 2 made a decision based on a dirty read, it may now result in the bank allowing further transactions or loans from Bob that were based on incorrect account balance information.

If the dirty read had not occurred, then Transaction 2 would have read Bob's account as 500, and the rollback of Transaction 1 would not have caused Transaction 2's decision error.

  • Dirty Writes

A dirty write occurs when a transaction "writes" data that has been modified by another uncommitted transaction. Dirty writes cause the following problems. 1:

  1. Transactions need to update multiple objects, and dirty writes can result in unintended errors. 2.
  2. Preventing dirty writes does not solve the subsequent problem of missing updates, because in the latter case, the second write is indeed executed only after the first transaction commit, which will be discussed later.

Let's take the banking system as an example:

  1. transaction 1: a teller operates a transfer of 100 from Alice to Bob, updating Bob's account balance to 600, but not yet committing it
  2. transaction 2: another teller operates a transfer of 100 from Alice to Bob, and updates Bob's account balance to 700 before transaction 1 is submitted.
  3. Transaction 2 is committed, transaction 1 is aborted, and Bob's account is at 700, with more money coming out of thin air.

This is where the first transaction isolation level is introduced: 'read-commit'.

Read-commit is the most basic level of transactional isolation, and it provides only two guarantees:

  • When reading the database, you will only see data that has been successfully committed (preventing dirty reads)
  • When writing to the database, only data that has been successfully committed will be overwritten (to prevent dirty writes).

Databases often use row locks to prevent dirty writes, here we use mutex locks to update objects to prevent dirty writes. Obviously, locks can also prevent dirty reads, but then reads and writes will compete resulting in poor performance. Do not understand the lock partner can be understood as, only one transaction can obtain a lock, another transaction to obtain the lock can not operate, know that the lock transactions such as obtaining the end of the transaction can continue to operate, obviously, so that the different transactions will be in order to execute, will not result in a dirty read dirty write.

  • Read Skew

If the same data is read multiple times within a transaction, and the data is not the same in the first and second reads, this means that read skew has occurred.

Examples. 1:

  1. Transaction 1: Reads the balance of account Alice, resulting in $1000.
  2. Transaction 2: Transfer $100 from Alice to Bob, and Alice's balance changes to $900 after commit.
  3. Transaction 1: Reads Alice's balance again, and the result is $900.
  4. Transaction 1 in the two reads in the different results, which may lead to business logic in the wrong decision (here you may not think there is nothing wrong, the balance is indeed changed to $ 900, but consider transaction 1 and transaction 2 serialized, then in the process of transaction 1, it must have read the data has always been 1,000, and wait for the completion of the execution of transaction 1 before executing transaction 2)

PS: A read tilt reads committed data, while a dirty read reads uncommitted data.

"Snapshot Isolation is the most common solution to the above problem. The general idea is that each transaction is read from a consistent snapshot of the database, the transaction begins to see the most recently committed data, even if the data may subsequently be modified by another transaction, but to ensure that each transaction will only see the old data at that particular point in time.

So that you may still not understand, let me introduce the MVCC model, to put it bluntly is that each transaction is recorded a version, it will not modify the existing data, take the above example, transaction 1, it reads the version 1 Alice balance of 1,000, when the transaction 2 began, it will create a new version to record Alice's balance, naturally, will not affect the balance of Alice in the version 1. Alice's balance in version 1. But you must have questions here, please continue to see the following update loss problem.

  • Loss of update

Generally, we modify data by reading it first and then modifying it by writing it, that is, the read-modify-write model. In the read-modify-write scenario, due to isolation, the subsequent write operations do not include the values modified by the previous write operations, which ultimately leads to the loss of the previous update.

Example:

  1. Transaction 1: Read Alice's balance of $1000 and plan to subtract $100. 2.
  2. Transaction 2: Read Alice's balance of $1000 and plan to subtract $200. 3.
  3. Transaction 1: Update Alice's balance to $900. 4.
  4. transaction 2: update Alice's balance to $800.

Obviously, the update in transaction 1 was overwritten by transaction 2, resulting in the loss of the changes in transaction 1, which should have resulted in an Alice balance of $700!

MVCC Solution:
On each update, MVCC creates a new version of the data and uses the transaction ID or timestamp to ensure that each transaction is updating based on the version of the data it started with. If the data version is found to have been modified at commit time, the transaction is rolled back.

Example:

  • Transaction 1 reads Alice's balance as $1000 and plans to subtract $100.
  • Transaction 2 also reads Alice's balance of $1000 and plans to subtract $200.
  • Transaction 1 attempts to update Alice's balance to $900, but upon commit it realizes that the data (version) has been changed and rolls back.
  • Transaction 2 successfully updates the balance to $800.

  • Write skew

There are two competing conditions triggered by multiple transactions writing to the same object at the same time, the dirty writes and lost updates described earlier, and then there is the more subtle case of write conflicts.

Suppose you are now developing a system that is going to give hospital doctors shifts, requiring at least one doctor to be on duty, when both Alice and Bob are unwell and then request a transfer at almost the same moment, with each transaction always first checking that at least two of the doctors are currently on duty, and if so, one is available for a transfer. Now that the database is using snapshot level isolation, both checks return that there are two doctors, so both transactions can proceed, and then they each update their on-call records, and both transactions are successfully committed, with the final result that no one is on-call.

This situation is defined as a write skew, where the two transactions update two different objects, and where the competition is less obvious. Imagine: if the two transactions were executed serially, the second doctor's request would have been rejected, and only the simultaneous execution of both transactions would have been abnormal.

One can think of write skew as a more generalized update loss problem. That is, if two transactions read the same set of objects and then update some of them:

  • Different transactions may update different objects, then write skew may occur
  • Different transactions may update different objects, and write skew may occur.

We have already given how to guard against update loss, but the options available for write skew have many limitations.

  • Since multiple objects are involved, single-object atomic operations won't work
  • Automatic detection of lost updates based on snapshot-level isolation is also problematic, and preventing write skew requires 'serializable' isolation.

Another distinction to be made is between 'serializability' and 'linearizability'. Serializability is the guarantee that the effect of concurrency will be the same as the effect of serialization, whereas linearizability refers to whether or not the performance of multiple nodes in a distributed system will look the same as that of a single node.

After introducing these concepts, let's go back and look at how cf does it. First of all, by default, D1 uses the "snapshot isolation" provided by SQLite, which is the scope of transaction isolation.

Then at the beginning of the article mentioned to forward all the requests to the same database, which is actually a simplified implementation of linearizability, read the copy of the back of the problems caused by the linearizability of the problem, coupled with the Lamport timestamp is to achieve linearizability, here is a little more complex, the article is actually saying more is how to ensure that the linearizability of the article, but the space is limited, there is a time to write about the linearizability of a separate linearizability-related stuff.

💖 💪 🙅 🚩
smartfool
SmartFool

Posted on May 23, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related