Distributed Transaction
date
Jun 20, 2021
slug
comp90020-disttx
status
Published
tags
Programming
COMP90020
summary
type
Page
Year
FundamentalsCoordination in distributed transactionTwo-Phase Commit (2PC)Three-Phase commit (3PC)Concurrency control in distributed transactionsLockingTimestamp orderingTransaction Recovery
Fundamentals
- Distributed transaction is transaction in which more than one server is involved
- Challenge with distributed Txs:
- single Tx: consensus among different servers
- multiple Txs: consensus among different servers + global concurrency control
- Atomicity
- either all events of a Tx take effect or none of them ( all or nothing) .
- Consensus are required among different servers.
- Coordination in distributed consensus
- One server, the server that opens the Tx, becomes a coordinator for Tx.
- Coordinator
- Maintains a list of participating servers, i.e., servers that hold objects involved in the Tx
- Collects information from participants and make final decision
- Participants
- Know the coordinator
- Report to coordinator and follow its decision
Coordination in distributed transaction
Two-Phase Commit (2PC)
- DON’T CONFUSE 2PC AND 2PL
- Two-phase commit (2PC) and two-phase locking are two very different things. 2PC provides atomic commit in a distributed database, whereas 2PL provides serializable isolation
Three-Phase commit (3PC)
比2PC多一个pre-commit的阶段
Two-phase commit is called a blocking atomic commit protocol due to the fact that 2PC can become stuck waiting for the coordinator to recover. In theory, it is possible to make an atomic commit protocol nonblocking, so that it does not get stuck if a node fails. However, making this work in practice is not so straightforward.
As an alternative to 2PC, an algorithm called three-phase commit (3PC) has been proposed. However, 3PC assumes a network with bounded delay and nodes with bounded response times; in most practical systems with unbounded network delay and process pauses, it cannot guarantee atomicity.
In general, nonblocking atomic commit requires a perfect failure detector—i.e., a reliable mechanism for telling whether a node has crashed or not. In a network with unbounded delay a timeout is not a reliable failure detector, because a request may time out due to a network problem even if no node has crashed. For this reason, 2PC continues to be used, despite the known problem with coordinator failure.
Concurrency control in distributed transactions
In centralised transaction processing: guarantee serial equivalence
In distributed Tx is modular: each server is responsible for the serialisability of transactions that access its own objects.
In other words, if Tx T happens before U from server X’s point of view, we need to make sure the they happen in the same order from server Y ’s point of view.
Locking
- Locks are used locally by each server on its own objects.
- Deadlock detection: edge-chasing
Timestamp ordering
- Coordinator assigns unique global timestamp
- 其余的实现与Centralised Txs version的timestamp ordering一样
Transaction Recovery
- 为了满足Durability和Atomicity
- Goal: in the face of crashes, the server can be restored with the latest committed version of all of its objects. Durability and atomicity are satisfied.
- 这两个要求可以通过引入 recovery manager 实现
- What is Recovery Manager
- regularly saves objects in permanent storage ( in a “recovery file” ) for committed Txs
- restores the server’s object after a crash
- ( reorganises the recovery file to improve performance of recovery)
- ( reclaims storage space )
- Recovery file contains the current status of Tx and objects
- Recovery from the 2PC
- Coordinator and participants write additional entries to their own recovery files.
- When coordinator is prepared to commit ( and has already added a prepared to its recovery file ) , a coordinator entry is added to its recovery file.
- When a participant is ready to vote
Yes
, a participant entry is added to its recovery file, as well as an uncertain status.
- Recovery from crash