How to elegantly implement a multi-database outbox pattern
yedf2
Posted on July 12, 2022
Introduction to the Outbox pattern
A microservice may need to perform two steps, "save data" and "send events". For example, after publishing an article, the author's posting statistics need to be updated. The business requirement is that both operations fail or succeed at the same time, rather than one succeeding and one failing. If the article is eventually published and the update of posting statistics fails, the data will be inconsistent.
The outbox pattern is the most common pattern used to solve this problem and works as follows.
- the local business runs as a transaction, writing events to the message table before committing the transaction; when the transaction is committed, it commits both the business, and the events
- the events are sent to the message queue by polling the message table or listening to the binlog
- Polling: retrieve events from the message table every 1s or 0.2s, send them to the message queue, and then delete them
- Listening to the binlog: using a database tool such as Debezium, listen to the database binlog, fetch the events and send them to the message queue
- Write consumers and process events
As in 1, the business and the event are updated in the same transaction, ensuring that both will be committed at the same time.
In steps 2,3, both are operations that will not fail and will be retried and eventually succeed if a downtime event occurs in between, etc.
For the aforementioned post-commit statistics scenario, the above solution ensures that the statistics are finally updated and the data will reach eventual consistency
Problems with multiple databases
With today's popular microservices architecture, a single database is usually used for a microservice. When multiple services need to use the outbox model, then the traditional outbox architecture is more difficult to maintain.
- Polling for events: multiple database polling tasks need to be written in the polling process
- Listening to binlog for events: you need to listen to multiple database binlogs
Both of the above methods of fetching events are not very maintainable when dealing with a large number of databases. The architecture is not very resilient and if there are many databases and few events generated at any one time, the load on the architecture will be high and resources will be wasted. The ideal architecture load is one that is only related to the number of events sent, and not to other factors.
Solution
The open source distributed transaction framework https://github.com/dtm-labs/dtm has a two-stage message pattern inside that handles this problem very well. The following is an example of the use of an interbank transfer operation.
Go
msg := dtmcli.NewMsg(DtmServer, gid).
Add(busi.Busi+"/TransIn", &TransReq{Amount: 30})
err := msg.DoAndSubmitDB(busi.Busi+"/QueryPreparedB", db, func(tx *sql.Tx) error {
return busi.SagaAdjustBalance(tx, busi.TransOutUID, -req.Amount, "SUCCESS")
})
In this part of the code
- First generate a msg global transaction for the DTM, passing the server address of the DTM and the global transaction id
- Add a branch business logic to the msg, here the business logic is the transfer operation TransIn
- Then call msg's DoAndSubmitDB, a function that ensures that the business is executed successfully and the msg global transaction is committed, either succeeding or failing at the same time
- the first parameter for the check-back URL, the detailed meaning will be described later
- the second parameter is sql.DB, which is the database object accessed by the business
- the third parameter is the business function, the business in our example is to deduct $30 from the balance of A
Success Flow
How does DoAndSubmitDB ensure the atomicity of successful business execution and msg submission? Consider the following timing diagram.
In general, the 5 steps in the timing diagram will complete normally, the whole business proceeds as expected and the global transaction completes. There is a new element that needs to be explained here, which is that the commit of a msg is initiated in two phases, the first phase calls Prepare and the second phase calls Commit. when DTM receives the Prepare call, it does not call the branch transaction, but waits for the subsequent Submit. only when it receives the Submit, it starts the branch call and finally completes the global transaction.
Exceptions
In a distributed system, all types of downtime and network exceptions need to be considered, so let's look at the following possible problems.
First of all the most important goal we want to achieve is that the business executes successfully and the msg transaction is an atomic operation, so what if in the previous timing diagram, after the Prepare
message is sent successfully and before the Submit
message is sent successfully, what happens if there is an abnormal downtime? At this point dtm will detect that the transaction has timed out and will check back. For developers, this check-back is as simple as pasting in the following code.
Go
app.GET(BusiAPI+"/QueryPreparedB", dtmutil.WrapHandler2(func(c *gin.Context) interface{} {
return MustBarrierFromGin(c).QueryPrepared(dbGet())
}))
If you are using something other than the go framework gin
, then you will need to make some minor modifications to suit your framework, but the code is generic and suitable for each of your businesses.
The main principle of check-back is mainly through the message table, but dtm's check-back has been carefully proved to be able to handle the following situations.
- The local transaction has not started at the time of the check-back
- The local transaction is still in progress at the time of the check-back
- The local transaction has been rolled back at the time of the check-back
- The local transaction has been committed t the time of the check-back
The detailed check-back principle is somewhat complex and has been patented, so it will not be described in detail here, for more information you can refer to https://en.dtm.pub/practice/msg.html
Multi-database support
With this solution, if you need to handle multiple databases, all you need to do is create the event tables for each database, and pass in different database connections where you check back.
Compared to the original polling table and listening binlog solution, the cost of operation and maintenance is greatly reduced. The load of the architecture is only related to the number of events and not to other factors such as the number of databases, making it more resilient.
More storage engine support
dtm's two-stage messages provide not only database support for DoAndSubmitDB
, but also NoSQL support
Mongo support
The following code ensures that both business and messages are committed simultaneously under Mongo
Go
err := msg.DoAndSubmit(busi.Busi+"/RedisQueryPrepared", func(bb *dtmcli.BranchBarrier) error {
return bb.MongoCall(MongoGet(), func(sc mongo.SessionContext) error {
return SagaMongoAdjustBalance(sc, sc.Client(), TransOutUID, -reqFrom(c).Amount, reqFrom(c).TransOutResult)
})
})
Redis support
The following code ensures that both the business and the message are committed simultaneously under Redis
Go
err := msg.DoAndSubmit(busi.Busi+"/RedisQueryPrepared", func(bb *dtmcli.BranchBarrier) error {
return bb.RedisCheckAdjustAmount(busi.RedisGet(), busi.GetRedisAccountKey(busi.TransOutUID), -30, 86400)
})
```''
The dtm check-back pattern can easily be extended to a wide variety of other transaction-enabled storage engines
## Solution features
The following features are available under two-phase messaging.
- Elegant support for multiple databases
- Support for not only SQL databases, but also NoSQL such as Mongo and Redis
- Short code, significantly less code than the usual outbox pattern
- The entire architecture and development process does not involve message queues, only api, making it easier to get started
- Load is only related to the volume of messages, not the number of databases involved
## Compare to RocketMQ transactional messages
Check-back was first proposed in RocketMQ transaction messages, but I has searched for examples of check-backs and various case studies, but has not found a check-back solution that handles all kinds of exceptions well. None of the solutions found correctly handle the "local transaction is still in progress" scenario, and all have corner cases that lead to inconsistent data, see [https://en.dtm.pub/practice/msg.html](https://en.dtm.pub/practice/msg.html) for more information .
In addition, dtm's two-stage messages do not require the introduction of a queue, or can be used in conjunction with other message queues, so they are more widely available
## Summary
The dtm two-stage messaging presented in this article is better suited to multi-database situations. The architecture solution, with its many advantages, is a perfect alternative to the outbox pattern and gives developers a simpler and easier to use architecture.
You are welcomed to visit [https://github.com/dtm-labs/dtm](https://github.com/dtm-labs/dtm)
Posted on July 12, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.