[Feature Request] Diem Forensics #7133

simplespy · 2021-01-06T04:12:22Z

🚀 Feature Request

(We write this request in the form of a DIP).

Summary

This DIP describes a forensic protocol and implementation that can irrefutably detect Byzantine validators in the rare case that there is a safety violation in Diem.

LibraBFT allows the Diem Blockchain to tolerate up to one-third Byzantine validators within the validator network under the partially synchronous network setting [1]. Let $n$ be the total number of validators in the Diem blockchain. In the rare case that the number of Byzantine validators $f$ exceeds $t=\frac{n}{3}$, security and liveness could be violated. This DIP focuses on the "forensic support" for safety violations: how to identify as many of the Byzantine validators by as many honest witnesses in as distributed manner as possible. We demonstrate a concrete protocol that when $f$ exceeds $t$, at least $t+1$ of culpable validators can be identified (with cryptographic integrity) by at least $2t+1−f$ honest validators individually.

Motivation

An important property satisfied by any Byzantine fault tolerant consensus protocol is agreement, which requires honest parties to not decide on conflicting values. Depending on the network model, typical consensus protocols tolerate only a fraction of Byzantine parties. In particular, DiemBFT can tolerate $t=\frac{n}{3}$ Byzantine validators. If the number of Byzantine validators exceeds this threshold, the protocols do not provide safety (or liveness).

Motivated by situations where $\ge\frac{n}{3}$ Byzantine validators have successfully mounted a safety attack, this DIP focuses on providing forensic support to identify the validators that acted maliciously. Specifically, we demonstrate that strong forensic support can be provided for DiemBFT as presently implemented (no changes necessary): there exist $2t+1-f$ honest validators who can individually identify at least $t+1$ Byzantine validators and provide irrefutable evidence to an external client.

Safety Violation on Diem

BFT assumption

The safety of Diem is based on a key assumption: if there are $f$ Byzantine validators, there exist $2f + 1$ honest validators; so $f < \frac{n}{3}$. Mathematically, the following two lemmas underpin the safety guarantee.

There can only be one certified block per round.
Two blocks committed in different rounds must belong to the same chain.

Informally, the first lemma holds because Diem ensures that an honest validator only votes once in a round. Therefore, when the adversary does not corrupt more than $t$ validators, there cannot be two conflicting quorum certificates (QCs) within a round, since two conflicting QCs indicate $t+1$ validators vote for two conflicting blocks, which means $t+1$ validators are corrupted.

For the second argument, when the adversary does not corrupt more than $t$ validators, Diem ensures safety across rounds by a simple voting rule. Whenever validators receive a block, they maintain a preferred round, defined as the highest known grandparent round (also called a lock). The rule is that validators vote for a block if its parent round is at least the preferred round [1]. If a QC commits block $B$ in round $e+2$ (the QC is called commitQC for block $B$), at least $2t+1$ validators remember $e$ as their preferred rounds. In rounds higher than $e$, there could not be a QC on blocks that are not descendants of $B$, since such a quorum certificate means $t+1$ validators (among $2t+1$ validators) who lock on $(B,e)$ vote for block $B'$ on a conflicting chain as $B$.

Safety violation

When the assumption does not hold, there are exactly two possible types of safety violations (Figure 1).

Safety violation within a round: two different blocks are committed in a round
Safety violation across rounds: two blocks that belong to two conflicting chains are committed in different rounds.

Figure 1. Two types of safety violations.

For the first case, when the adversary corrupts $\geq t+1$ validators, they can vote for two conflicting blocks, $B$ and $B'$ in the same round. Let us split $2t$ honest validators into two sets $P$ and $Q$. If the leader is corrupted, it can equivocate to validators in $P$ and $Q$. As a result, validators in $P$ vote for $B$ and validators in $Q$ vote for $B'$. Two conflicting blocks thus both get $2t+1$ votes, which are enough to form two conflicting QCs.

For the second case, when the adversary corrupts $\geq t+1$ validators, these validators can vote for a conflicting block $B'$ despite having a preferred round $e$ on block $B$. Again, let us split $2t$ honest validators into two sets $P$ and $Q$. Validators in $P$ vote for block $B$ in round $e+2$ and also lock on it whereas validators in $Q$ are not aware of block $B$ or the corresponding lock. Then in some higher round, a leader may propose a conflicting block $B'$ with an outdated QC. Validators in $Q$ are not aware of the lock on $B$ so they can vote for $B'$ along with the $t+1$ corrupted validators who violate the voting rule. As a result, the leader can collect $2t+1$ votes to form a QC in the higher round (also called a prepareQC). This QC is sufficient to unlock any honest validator and subsequently validators can commit a conflicting block.

Forensic Support for Diem

In the analysis of safety violation above, we observe that the corrupted validators must send certain messages in order to break safety: 1) to break safety within a round, corrupted validators vote for two conflicting blocks in the round, 2) to break safety across rounds, they vote for a conflicting block despite having a lock on a conflicting block (thus violating the voting rule). Those messages are signed by their secret keys and hence can serve as irrefutable evidence of their misbehavior. In addition, no honest validator will vote twice within a round or violate the voting rule, therefore we will only hold corrupted validators culpable.

Armed with this intuition, the forensic protocol for DiemBFT is the following:

To identify disagreement and trigger the forensic protocol, two conflicting commitQCs need to be provided to the forensic protocol as input.
If two commitQCs are within a round, the validators who vote for both commitQCs are corrupted validators.
If two commitQCs are across rounds, we denote by commitQC₁ the lower one and commitQC₂ the higher one. We query all validators for a prepareQC (denoted by prepareQC^#) such that it is later than commit₁, the block $B'$ it votes for conflicts with the block $B$ of commitQC₁, and the parent of $B'$ is before $B$. The validators who vote for commitQC₁ and prepareQC^# are corrupted validators.

There are $n=3t+1$ validators in the system. In both of the cases, the process of identifying culpable Byzantine validators involves performing appropriate quorum intersections: since two quorums of $2t+1$ validators intersect in $t+1$ validators, we are able to identify $\geq t+1$ Byzantine validators. Who are the witnesses? For conflicting commits within a round, the commitQCs themselves can prove culpability. For conflicting commits across rounds, honest validators having access to prepareQC^# are witnesses. It turns out that the existence of commit₂ implies that $2t+1$ validators should have received prepareQC^#, out of which $f$ of them are Byzantine. The remaining $2t+1-f$ validators can act as witnesses. This also implies that the forensic support holds only when $f < 2t+1$; when the number of Byzantine validators are higher, we may have no witnesses.

An example attack is depicted in Figure 2, where conflicting commits across rounds occur; here $t+1$ red validators are corrupted. An honest validator commits $B$ in round $e$ on receiving commitQC₁. The formation of commitQC₁ indicates blue validators are locked on $(B,e)$. Some rounds later, a malicious leader proposes a conflicting block $B'$ and sends the proposal to another set of honest validators. According to the voting rule, green validators will vote and a higher prepareQC^# for $B'$ is formed. After two rounds of voting, $B'$ is committed by another honest validator. In this example, red validators are held accountable since they vote for both commitQC₁ and prepareQC^#. Blue and green validators have access to the evidence.

Figure 2. An attack in action, where red validators are held culpable for sending stale prepareQC during view change, blue and green validators are witnesses.

Specification

The forensic module can be added on top of Diem without touching the existing codebase. It consists of two components, a database (FORENSIC_STORE) used to store quorum certificates received by validators, which can be accessed by clients through JSON-RPC requests, and an independent DETECTOR run by clients to analyze the forensic information.

Figure 3. Forensic module on top of Diem architechture.

A demo dashboard below shows the information collected by the forensic protocol and the analysis results.

Figure 4. Dashboard to display forensic information.

Implementation

A reference implementation is available at the following URL: https://github.com/wgr523/libra

Compatibility

The forensic module does not conflict with existing versions of Diem. A Diem validator that supports forensics appears no different within the Diem payment network (DPN) to Diem validators running older software versions. Diem validators that support forensics can receive RPC requests from clients and reply with messages. Obviously, older validators are not capable of replying to forensic requests. If a client who wants to use forensic features has no peers that support forensics, then it cannot gather enough information to have forensic support thus naturally decays to older clients.

References

State Machine Replication in the Diem Blockchain. https://developers.diem.com/docs/technical-papers/state-machine-validatortion-paper/
BFT Protocol Forensics. https://arxiv.org/abs/2010.06785

The text was updated successfully, but these errors were encountered:

dahliamalkhi · 2021-01-10T15:39:41Z

Thanks @simplespy for an excellent description and an interesting proposal!

@aching @davidiw @zekun000 @dmitri-perelman any comments?

aching · 2021-01-11T08:31:32Z

Thank you for the very detailed proposal and sample implementation @simplespy!

We've been thinking along the lines of how to identify byzantine behavior (even before a safety violation actually occurs) and how to report it. The obvious cases are both safety related (e.g double voting, voting on non-increasing rounds, etc.) and liveness related (e.g. proposing without a valid round proof).

This proposal fits in with this area of work. We'll want to study how to put this all together into a holistic forensics protocol that does its best to avoid slowing down performance (for instance - maybe async writes to the db for example) and the right place for the forensic APIs (e.g. forensic_get_latest_round and forensic_get_quorum_cert_at_round) as validators do not expose this JSON-RPC APIs outside of the validator network.

cc @mimoo - who was thinking about Byzantine logging

Perhaps we can follow up with some thoughts on how this whole area can make progress in a few days?

simplespy added the enhancement New feature or request label Jan 6, 2021

mimoo assigned aching, mimoo and zekun000 Jan 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Diem Forensics #7133

[Feature Request] Diem Forensics #7133

simplespy commented Jan 6, 2021

dahliamalkhi commented Jan 10, 2021

aching commented Jan 11, 2021

[Feature Request] Diem Forensics #7133

[Feature Request] Diem Forensics #7133

Comments

simplespy commented Jan 6, 2021

🚀 Feature Request

Summary

Motivation

Safety Violation on Diem

BFT assumption

Safety violation

Forensic Support for Diem

Specification

Implementation

Compatibility

References

dahliamalkhi commented Jan 10, 2021

aching commented Jan 11, 2021