L. Lamport, R. Shostak and M. Pease. ACM Transactions onProgramming Languages and Systems, 4(3):382-401, July 1982
Notes by Indranil Gupta, March 08, 1999.
Adapted from
- original notes by Xun Wilson Huang.
- original notes by Lawrence Kesteloot.
Thanks to: Xun Wilson Huang, Lawrence Kesteloot.
Problem Statement: Byzantine GeneralsProblem(BGP)
The setting: There are n generals, one of them thecommanding general. Generals can send (and receive) messages fromother generals.
The problem: Develop a communication protocol for thecommanding general to send an order to the n-1 lieutenantgenerals so that
- All loyal lieutenants obey the same order
- If the commanding general is loyal, every loyal lieutenant obeys the order he sends.
Adversary: Any of the generals could be traitorsi.e., could send inconsistent messages regarding the order toother generals.
Impossibility Results
- For n = 3 generals and 1 traitor, there is no solution (protocol). This is because a loyal lieutenant cannot distinguish who is the traitor when he gets conflicting information from the commander and the other lieutenant. Let's call this the 3-Generals Problem.
- BGP for n < 3m+1 generals and m traitors can be reduced to the 3 - generals problem, with each of the Byzantine generals simulating at most m lieutenants and taking the same decision as the loyal lieutenants they simulate. Thus BGP for n < 3m+1 and m traitors is not solvable.
- Reaching approximation is as hard as reaching agreement.
I. A solution with oral messages for n > 3m
A solution for BGP with n > 3m and upto m traitors, isgiven.
Oral message system properties:
- A1. Every message that is sent is delivered correctly. -> No message loss.
- A2. The receiver of a message knows who sent it. -> Completely connected network with reliable links(due to A1).
- A3. The absence of a message can be detected. -> Synchronous system only.
Every general can send a message to every other general.
Solution in brief:
- uses a function majority which takes in a set of values and returns the value that is the majority among them (a possible implementation - median of the values).
- uses 'rounds' in each of which a general broadcasts the value he has received in the earler round to all the other generals through whom the value has not passed before he received it.
- when returning from the round, for each j, any two loyal lieutenants receive the same vector of values {v1, ... v(n-1)}. As the majority of the loyal lieutenants' values in these is ensured, applying the majority function on {v1, ... v(n-1)} to obtain vn preserves the above fact (that any two loyal lieutenants receive the same vector of values {v1, ... vn}). This ensures that BGP is solved.
Note: If the commanderis not a traitor, we can be done in 2 rounds. If the commander isa traitor, you may need upto m+1 rounds.
II. A solution with (unforgable) signed messages
The difficulty of BGP is in the ability of a traitorlieutenant to lie about the commander's order. If we can restrictthis ability by making the following assumptions, BGP is solvablewith any number of traitors as long as their maximum numberis known.
Signed messages:
- A4. In addition to the 3 assumptions made in the solution with oral messages, we add the following assumption.
- A loyal general's signature cannot be forged, any alteration can be detected. -> can drop a message, but can't change it
- Any one can verify the authenticity of a signature. -> no one can fool a general
Again, assume a fully connected message graph among thegenerals.
Solution in brief:
Uses a majority-like function called choice.
The solution:
- the commander sends a signed order to lieutenants
- if a lieutenant receives an order from some one (either from commander directly, or from other lieutenants), he verifies it and then puts it in a set V if it's not already there. Relay the order if there are less than m distinct signatures on the order.
- Everyone halts at round m+2, and use choice(V) as the desired action
The algorithm is to make all loyal lieutenants keep the sameset of V, thus choice(V) is the same. If the commander is loyal,the algorithm works because all loyal lieutenants have thecorrect order by round 1 and by unforgablity no more orders canbe produced. If the commander is not loyal, by running thealgorithm to round m+1, at least one loyal lieutenant will getthe order before round m( because there are only m traitors). Andthat loyal lieutenant will send it to all others. In short, ifone loyal lieutenant gets an order, all loyal lieutenants willget it in the next round.
III. IV. Relaxing the assumption on full-connectivity of thegenerals graph - extending above solutions
The previous 2 solutions can be extended to relax theassumption that the message graph among the generals is fullyconnected.
- Oral messages: Solution with oral messages is extended to solve BGP with upto m traitors in a p-regular graph with m>0 and p>3m-1.
- Unforgable messages: Earlier solution with signed messages solves BGP with upto m traitors in (m+d-1) rounds, where d is the diameter of the subgraph of loyal generals. Assumption here: subgraph of loyal generals is connected (this can be relaxed by relaxing the problem statement of BGP)
Practical use of BGP in building real life systems
The best way to provide faul-tolerant decision-making inredundant systems is by majority voting. A faulty input devicesmay generate meaningless inputs, but majority voting would ensurethat the same meaningless values are used.
For majority voting to yield a reliable system, the following2 conditions must be satisified
- All non-faulty processors must use the same input value
- If input unit is non-faulty, then all non-faulty processes use the value it provides
But these are just the requirements of the BGP !
So we can apply the above solutions to the BGP in real-life.Now what about the practicality of the assumptions made by thosesolutions ?
About A1: In real life, link failures occur. However, link failures are indistinguishable with failures of processors, therefore we can count the link failures as one of the m. Signed message is insensitive to link failures because no message can be forged even if links fail.
About A2: What is actually required is that no traitor can forge a non-faulty process' message. A2 not needed in the solution with signed messages.
About A3: In an asynchronous system, this condition cannot be satisfied. It is usually implemented via time-outs.
About A4: Signing message has 2 aspects:
- If processor is non-faulty, then no faulty processor can generate S(M). This can never be solved in real-life - only its probability of failure reduced.
- Given M and X, any one can verify if X == S(M). This is doable in real world.
Further Observations
- Optimizations for the BGP solution.
- combine messages to reduce the total number of messages.
- reduce the amount of information transferred.
- BGP required in the most general undecidable case of process failure.
- Solution presented is optimal because Fischer and Lynch have proved that any solution to the BGP necessarily has each lieutenant wait for a message that has passed through the hands of at least m generals after the commander.
- Solutions to clock synchronization (needed for the implementation of above BGP solutions) - very similar to the solutions for BGP.
- Further impossibility results
- BGP with messages transmitted arbitrarily quickly with upper bound on message transmission delay.
- Consensus with restricting traitors to fail-crash only.
- BGP works but is inherently expensive, especially in terms of number of messages O(m !). So it's a trade-off between performance and reliability. If you want more reliability in the most general failure conditions, you have to settle for a (costly) BGP solution. If, however, you can relax the failure conditions in your systems (ex. assume only fail-crash processes in a synchronous system and leave it to God to ensure that), you can go for cheaper solutions.
Critique and Questions
- Graph connectivity. Are p-regular topologies that frequent ? Can we extend the BGP solutions to any network topology ? Has it been extended to any other topologies ?
- Value of m: How would one obtain a reasonable value for maximum m in a practical system (note that this maximum number is required even in the solution with signed messages).
- Synchronous/asynchronous systems: How many synchronous system do we really use (SMP machines, and?) How about asynchronous systems ?
- Further work after this paper:
- What other solutions to BGP have been proposed after this paper ?
- Has any attempt been made to extend the BGP solutions to asynchronous systems to ensure 'some degree/probability' of reliability ?
- Answers in next section ;-)
- Bounds on best possible BGP solution (in terms of messages) ?
Further readings
Impossibility/necessity results
- Fischer, M. J., Lynch, N. A., and Paterson, M. S. ``Impossibility of Distributed Consensus with One Faulty Process,'' J. ACM 32, 2 (April 1985), 374--382.
- Dolev, D., Dwork, C., and Stockmeyer, L. ``On the Minimal Synchronism Needed for Distributed Consensus,'' J. ACM 34, 1 (January 1987), 77--97.
Approximate agreement
- Bracha, G. ``An O(log n) Expected Rounds Randomized Byzantine Generals Protocol,'' J. ACM 34, 4 (October 1987), 910--920.
- Bracha, G. and Toueg, S. ``Asynchronous Consensus and Broadcast Protocols,'' J. ACM 32, 4 (October 1985), 824--840.
- Ben-Or, M. ``Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols,'' ACM Symposium on Principles of Distributed Computing, 1983, 27--30.
- Dolev, D., Lynch, N. A., Pinter, S. S., Stark, E. W., and Weihl, W. E. ``Reaching Approximate Agreement in the Presence of Faults,'' J. ACM 33, 3 (July 1986), 499--516.
- Dolev, D., Ruediger, R., and Strong, H. R. ``Early Stopping in Byzantine Agreement,'' J. ACM 37, 4 (October 1990), 720--741.
- Hadzilacos, V. and Halpern, J. Y. ``Message-Optimal Protocols for Byzantine Agreement,'' ACM Symposium on Principles of Distributed Computing, 1991, 309--323.
- Halpern, J. Y., Moses, Y., and Waarts, O. ``A Characterization of Eventual Byzantine Agreement,'' ACM Symposium on Principles of Distributed Computing, 1990, 333--346.
Failure detectors
- Chandra, T. D., Hadzilacos, V., and Toueg, S. ``The Weakest Failure Detector for Solving Consensus,'' ACM Symposium on Principles of Distributed Computing, 1992, 147--158.
- Chandra, T. D. and Toueg, S. ``Unreliable Failure Detectors for Asynchronous Systems,'' ACM Symposium on Principles of Distributed Computing, 1991, 325--340.