# Distributed transaction

Two-phase commit (2PC) is a protocol used for achieving distributed atomicity in distributed transactions. Here's how it works:

1. **Prepare Phase**: The transaction coordinator (also known as the transaction manager) sends a prepare request to all participants (also known as cohorts). Each participant determines if it can commit the transaction based on its local state and replies with either a "yes" vote (prepared to commit) or a "no" vote (unable to commit).
2. **Commit Phase**: If all participants vote "yes" during the prepare phase, the coordinator sends a commit request to all participants. Upon receiving the commit request, each participant applies the changes associated with the transaction and sends an acknowledgment back to the coordinator.
3. **Abort Phase**: If any participant votes "no" during the prepare phase or if the coordinator fails to receive acknowledgments from all participants during the commit phase, the coordinator sends an abort request to all participants. Each participant then rolls back the changes associated with the transaction.

Two-phase commit ensures that either all participants commit the transaction or none of them do, thereby maintaining atomicity across distributed systems.

To avoid a single point of failure for the coordinator and stale locks, you can employ the following strategies:

1. **Coordinator Redundancy**: Use multiple coordinators (coordinator replicas) in an active-passive or active-active configuration. If the primary coordinator fails, a standby coordinator can take over to ensure continuity of the protocol.
2. **Heartbeats and Timeouts**: Implement heartbeat mechanisms between the coordinator and participants to detect failures promptly. Use timeouts to detect unresponsive participants or coordinators and initiate appropriate recovery actions.
3. **Persistent State**: Ensure that the state of the protocol (e.g., transaction status, participant votes) is durably stored, either in a distributed database or on disk. This allows the system to recover from failures and resume protocol execution from the last known state.
4. **Quorum-Based Decisions**: Use a quorum-based approach to decision-making to prevent split-brain scenarios and ensure that a decision requires a majority of votes from the participants.
5. **Retry and Backoff**: Implement retry and exponential backoff strategies to handle transient failures gracefully. Retrying failed operations with increasing delays can help mitigate the impact of temporary network or system failures.

Now, regarding the main differences with Saga:

* **Two-Phase Commit (2PC)**: 2PC is a synchronous protocol that ensures all participants either commit or abort the transaction atomically. It requires strict coordination and can suffer from blocking and deadlock issues, especially in large distributed systems.

<figure><img src="/files/pi3LAAsIlnQcyWgywdvf" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/MXpceyjdG66aR2epAAd2" alt=""><figcaption></figcaption></figure>

* **Saga**: Saga is an alternative approach to handling distributed transactions, particularly in long-running and loosely coupled systems. Sagas are a sequence of local transactions (compensating actions) that collectively implement a distributed transaction. Each local transaction is idempotent and reversible, allowing for partial completion and compensation in case of failures. Unlike 2PC, sagas are typically asynchronous, decentralized, and can handle more complex transactional semantics, such as compensating transactions for failed operations.<br>

  <figure><img src="/files/NxWd83goJShMAwoo80ae" alt=""><figcaption></figcaption></figure>

In summary, while 2PC provides stronger atomicity guarantees, Sagas offer more flexibility and scalability for managing distributed transactions in large-scale and heterogeneous environments. The choice between them depends on the specific requirements and constraints of the system architecture and transactional workflows.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://design.bea.ai/common-systems-to-use-in-system-design/distributed-transaction.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
