Consensus

Consensus in human systems is actually usually pretty easy because of the social layer of society. This fault tolerance against 51% attacks is due to the fact that convincing the community that any engineered ‘truth’ is the real on requires subverting every trusted member in the community, most notably media and news sources (also why systems of authoritarian power are so scary).

A difficult problem for governance within communities

Note that this is inherently different from collaboration methods like CRDTs. Collaboration involves keeping all edits and merging them. Consensus involves picking one of several proposed values and agreeing on it.

Example applications include: SMR, Byzantine Agreement

Consensus and Humming in the IETF

Source: IETF

On rough consensus

Humming as ‘temp checks’ for people to voice disagreement but default assumption is optimistic trust
“While counting heads might give a good guess as to what the rough consensus will be, doing so can allow important minority views to get lost in the noise. One of the strengths of a consensus model is that minority views are addressed, and using a rough consensus model should not take away from that.”
“We can’t know who the “members” of any given working group would be at any one time, and we certainly can’t know who all of the “members” of the IETF would be: That’s why we refer to “participants” in the IETF; the IETF doesn’t really have “members”. Indeed, we often recruit additional implementers and other experts into working groups in order to ensure that broader views are brought into the discussion. So, voting is simply not practical.”

Algorithmic Consensus

There are four requirements to such an algorithm:

Validity. The result must be a value that was submitted by at least one of the processes. The consensus algorithm cannot just make up a value.
Uniform agreement. All nodes must select the same value.
Integrity. A node can select only a single value. That is, a node cannot announce one outcome and later change its mind.
Termination. Also known as progress, every node must eventually reach a decision.

There are two main protocol paradigms for achieving consensus in the presence of Byzantine nodes:

Classic BFT protocols: typically uses two voting rounds to ensure consistency
1. One phase to guarantee proposal uniqueness using a quorum certificate of $n - f$ votes
2. The other phase is to convince replicas that the leader is safe to propose new entries
3. Examples include: Tendermint, Tangaroa, HotStuff, PBFT
Longest-chain consensus
1. Examples include: most consensus mechanisms for cryptocurrencies like Bitcoin, Ethereum

	Classic BFT	Longest-chain Consensus
Safety/Liveness tradeoff	Favours safety in the face of an attack	Favour liveness in the face of an attack
Finality	Instant and deterministic	Probabilistic (at risk of potentially large chain reorganizations and double-spend attacks)
Fork behaviour	Rare but difficult to recover from	Embrace forks, uses in-protocol methods for resolving ambiguity as to which fork is correct
FLP Result Behaviour	sacrifice either liveness or consistency in the face of an attack (assuming <33% Byzantine as per FLP Result)	Does not apply as longest-chain consensus is non-deterministic
Permission model	Generally permissioned (see Sandglass)	Permissionless

Note that there have been attempts to bridge Classic BFT models with Nakamoto-style consensus ones with hybrid consensus models which use a permissionless chain to determine a participant/proposer rotation in a reconfigurable BFT engine.

Comparisons between different BFT SMR protocols

All protocols are of the following:

protocols for byzantine fault-tolerant SMR
All work in the partially synchronous system model and obtain safety (always) and liveness (after GST) in the face of an adversary that controls $f$ replicas out of a total of $n = 3 f + 1$ replicas (per FLP Result)
All these protocols are based on the classic leader-based primary-backup approach where leaders are replaced in a view-change (or election to use Raft terminology) protocol.

Below is a comparison of a few top protocols and their tradeoffs in authenticator complexity

	Best-case Latency (rounds)	Normal-case Communication	View-change Communication	Leader Rotation	Responsiveness
PBFT	2	$O (n^{2})$	$O (n^{3})$	On suspected fault	Yes
Tendermint	2	$O (n)$ using thresholded signatures, $O (n^{2})$ otherwise	$O (n)$	Every round	No
SBFT	1	$O (n)$	$O (n^{2})$	On suspected fault	Yes
HotStuff	3	$O (n)$	$O (n)$	Every round	Yes

Responsiveness here refers to the property that a non-faulty leader can drive the protocol to consensus in a time depending on actual message delays and not the theoretical upper bound on message transmission delays. In partially synchronous models, we use optimistic responsiveness, which requires responsiveness only after GST is reached.

Leader rotation tradeoff:

maintaining a stable leader means less overhead and better performance due to stability when the leader is honest and trusted
constantly rotating the leader provides a stronger fairness guarantee against stable malicious leaders

Pipelining

In PBFT, SBFT, and HotStuff the leader maintains a window of open slots and is allowed to concurrently work on committing all open slots in his active window. Conceptually, this is like TCP where a sender does not have to wait for the ACK of packet $i$ before sending message $i + 1$ . This window can significantly increase throughput by allowing the leader to concurrently coordinate several actions of slot commitments.

Impossibility Results

When consensus is impossible to achieve:

jzhao.xyz

Recent Writing

2024: Centering

Taste is a guide for what is worthwhile

Agentic Computing

Building a BFT JSON CRDT

Recent Notes

TrueTime

Concurrency control