Bea.AI design blog
  • System design algorithms
    • Consistant Hashing, Bloom Filter, SkipLists, B-Tree, LRU/LFU
    • Reverse index, Inverted index, Trie, Rsync, Merkle tree
    • Leaky bucket/Token bucket, GeoHash, Quadtree, Leader election, Consensus
    • Time sync, Erasure coding, Message digest, Atomic commit, Mutual exclusion
    • Global state collection, Gossip, Replica management, Self-stabilization, HyperLoglog
    • Count-min Sketch, Hierarchial timing, Operational transform, Last write Wins, Vector clocks
  • Systems design
    • Metrics monitor & alart system
    • API gateway
    • Distributed Key-Value Storage
    • Distributed notification system
    • Task Scheduler
    • Elevator System
  • General Design Templates
    • System Design Blueprint
  • Design topics
    • Topics 1
    • Topics 2
    • Topics 3
    • Topics 4
    • Topics 5
    • Topics 6
    • Topics 7
    • Topics 8
    • Topics 9
    • Topics 10
    • Topics 11
    • Topics 12
    • Topics 13
    • Topics 14
    • Topics 15
    • Topics 16
    • Topics 17
    • Topics 18
    • Topics 19
    • Topics 20
    • Topics 21
    • Topics 22
    • Topics 23
  • System design interview steps & template
  • Typical systems and tips
  • Behaviour Questions
  • Roles requirement
    • SDE-traffic-apple
    • SDE-tools-linkedin
  • Common Systems to use in system design
    • Kafka
    • Flink
    • InfluxDB & Prometheus
    • Kubernetes & Docker
    • Zoomkeeper & Etcd
    • Redis
    • Distributed transaction
  • Design Patterns and Use Scenarios
    • Pattern to creating objects
    • Object Assembling
    • Object Interaction / Responsibility
  • Micro-service network / Gateway
    • Basic concept
    • Performance analysis & optimization
    • Open source techs
  • Systems
    • Distributed Priority Queue
    • Design a Live Video Streaming Platform
Powered by GitBook
On this page
  1. Common Systems to use in system design

Distributed transaction

PreviousRedisNextDesign Patterns and Use Scenarios

Last updated 1 year ago

Two-phase commit (2PC) is a protocol used for achieving distributed atomicity in distributed transactions. Here's how it works:

  1. Prepare Phase: The transaction coordinator (also known as the transaction manager) sends a prepare request to all participants (also known as cohorts). Each participant determines if it can commit the transaction based on its local state and replies with either a "yes" vote (prepared to commit) or a "no" vote (unable to commit).

  2. Commit Phase: If all participants vote "yes" during the prepare phase, the coordinator sends a commit request to all participants. Upon receiving the commit request, each participant applies the changes associated with the transaction and sends an acknowledgment back to the coordinator.

  3. Abort Phase: If any participant votes "no" during the prepare phase or if the coordinator fails to receive acknowledgments from all participants during the commit phase, the coordinator sends an abort request to all participants. Each participant then rolls back the changes associated with the transaction.

Two-phase commit ensures that either all participants commit the transaction or none of them do, thereby maintaining atomicity across distributed systems.

To avoid a single point of failure for the coordinator and stale locks, you can employ the following strategies:

  1. Coordinator Redundancy: Use multiple coordinators (coordinator replicas) in an active-passive or active-active configuration. If the primary coordinator fails, a standby coordinator can take over to ensure continuity of the protocol.

  2. Heartbeats and Timeouts: Implement heartbeat mechanisms between the coordinator and participants to detect failures promptly. Use timeouts to detect unresponsive participants or coordinators and initiate appropriate recovery actions.

  3. Persistent State: Ensure that the state of the protocol (e.g., transaction status, participant votes) is durably stored, either in a distributed database or on disk. This allows the system to recover from failures and resume protocol execution from the last known state.

  4. Quorum-Based Decisions: Use a quorum-based approach to decision-making to prevent split-brain scenarios and ensure that a decision requires a majority of votes from the participants.

  5. Retry and Backoff: Implement retry and exponential backoff strategies to handle transient failures gracefully. Retrying failed operations with increasing delays can help mitigate the impact of temporary network or system failures.

Now, regarding the main differences with Saga:

  • Two-Phase Commit (2PC): 2PC is a synchronous protocol that ensures all participants either commit or abort the transaction atomically. It requires strict coordination and can suffer from blocking and deadlock issues, especially in large distributed systems.

  • Saga: Saga is an alternative approach to handling distributed transactions, particularly in long-running and loosely coupled systems. Sagas are a sequence of local transactions (compensating actions) that collectively implement a distributed transaction. Each local transaction is idempotent and reversible, allowing for partial completion and compensation in case of failures. Unlike 2PC, sagas are typically asynchronous, decentralized, and can handle more complex transactional semantics, such as compensating transactions for failed operations.

In summary, while 2PC provides stronger atomicity guarantees, Sagas offer more flexibility and scalability for managing distributed transactions in large-scale and heterogeneous environments. The choice between them depends on the specific requirements and constraints of the system architecture and transactional workflows.