Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Diem Forensics #7133

Open
simplespy opened this issue Jan 6, 2021 · 2 comments
Open

[Feature Request] Diem Forensics #7133

simplespy opened this issue Jan 6, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@simplespy
Copy link

馃殌 Feature Request

(We write this request in the form of a DIP).

Summary

This DIP describes a forensic protocol and implementation that can irrefutably detect Byzantine validators in the rare case that there is a safety violation in Diem.

LibraBFT allows the Diem Blockchain to tolerate up to one-third Byzantine validators within the validator network under the partially synchronous network setting [1]. Let $n$ be the total number of validators in the Diem blockchain. In the rare case that the number of Byzantine validators $f$ exceeds $t=\frac{n}{3}$, security and liveness could be violated. This DIP focuses on the "forensic support" for safety violations: how to identify as many of the Byzantine validators by as many honest witnesses in as distributed manner as possible. We demonstrate a concrete protocol that when $f$ exceeds $t$, at least $t+1$ of culpable validators can be identified (with cryptographic integrity) by at least $2t+1鈭抐$ honest validators individually.

Motivation

An important property satisfied by any Byzantine fault tolerant consensus protocol is agreement, which requires honest parties to not decide on conflicting values. Depending on the network model, typical consensus protocols tolerate only a fraction of Byzantine parties. In particular, DiemBFT can tolerate $t=\frac{n}{3}$ Byzantine validators. If the number of Byzantine validators exceeds this threshold, the protocols do not provide safety (or liveness).

Motivated by situations where $\ge\frac{n}{3}$ Byzantine validators have successfully mounted a safety attack, this DIP focuses on providing forensic support to identify the validators that acted maliciously. Specifically, we demonstrate that strong forensic support can be provided for DiemBFT as presently implemented (no changes necessary): there exist $2t+1-f$ honest validators who can individually identify at least $t+1$ Byzantine validators and provide irrefutable evidence to an external client.

Safety Violation on Diem

BFT assumption

The safety of Diem is based on a key assumption: if there are $f$ Byzantine validators, there exist $2f + 1$ honest validators; so $f < \frac{n}{3}$. Mathematically, the following two lemmas underpin the safety guarantee.

  1. There can only be one certified block per round.
  2. Two blocks committed in different rounds must belong to the same chain.

Informally, the first lemma holds because Diem ensures that an honest validator only votes once in a round. Therefore, when the adversary does not corrupt more than $t$ validators, there cannot be two conflicting quorum certificates (QCs) within a round, since two conflicting QCs indicate $t+1$ validators vote for two conflicting blocks, which means $t+1$ validators are corrupted.

For the second argument, when the adversary does not corrupt more than $t$ validators, Diem ensures safety across rounds by a simple voting rule. Whenever validators receive a block, they maintain a preferred round, defined as the highest known grandparent round (also called a lock). The rule is that validators vote for a block if its parent round is at least the preferred round [1]. If a QC commits block $B$ in round $e+2$ (the QC is called commitQC for block $B$), at least $2t+1$ validators remember $e$ as their preferred rounds. In rounds higher than $e$, there could not be a QC on blocks that are not descendants of $B$, since such a quorum certificate means $t+1$ validators (among $2t+1$ validators) who lock on $(B,e)$ vote for block $B'$ on a conflicting chain as $B$.

Safety violation

When the assumption does not hold, there are exactly two possible types of safety violations (Figure 1).

  1. Safety violation within a round: two different blocks are committed in a round
  2. Safety violation across rounds: two blocks that belong to two conflicting chains are committed in different rounds.

Figure 1. Two types of safety violations.

For the first case, when the adversary corrupts $\geq t+1$ validators, they can vote for two conflicting blocks, $B$ and $B'$ in the same round. Let us split $2t$ honest validators into two sets $P$ and $Q$. If the leader is corrupted, it can equivocate to validators in $P$ and $Q$. As a result, validators in $P$ vote for $B$ and validators in $Q$ vote for $B'$. Two conflicting blocks thus both get $2t+1$ votes, which are enough to form two conflicting QCs.

For the second case, when the adversary corrupts $\geq t+1$ validators, these validators can vote for a conflicting block $B'$ despite having a preferred round $e$ on block $B$. Again, let us split $2t$ honest validators into two sets $P$ and $Q$. Validators in $P$ vote for block $B$ in round $e+2$ and also lock on it whereas validators in $Q$ are not aware of block $B$ or the corresponding lock. Then in some higher round, a leader may propose a conflicting block $B'$ with an outdated QC. Validators in $Q$ are not aware of the lock on $B$ so they can vote for $B'$ along with the $t+1$ corrupted validators who violate the voting rule. As a result, the leader can collect $2t+1$ votes to form a QC in the higher round (also called a prepareQC). This QC is sufficient to unlock any honest validator and subsequently validators can commit a conflicting block.


Forensic Support for Diem


In the analysis of safety violation above, we observe that the corrupted validators must send certain messages in order to break safety: 1) to break safety within a round, corrupted validators vote for two conflicting blocks in the round, 2) to break safety across rounds, they vote for a conflicting block despite having a lock on a conflicting block (thus violating the voting rule). Those messages are signed by their secret keys and hence can serve as irrefutable evidence of their misbehavior. In addition, no honest validator will vote twice within a round or violate the voting rule, therefore we will only hold corrupted validators culpable.

Armed with this intuition, the forensic protocol for DiemBFT is the following:

  1. To identify disagreement and trigger the forensic protocol, two conflicting commitQCs need to be provided to the forensic protocol as input.
  2. If two commitQCs are within a round, the validators who vote for both commitQCs are corrupted validators.
  3. If two commitQCs are across rounds, we denote by commitQC1 the lower one and commitQC2 the higher one. We query all validators for a prepareQC (denoted by prepareQC#) such that it is later than commit1, the block $B'$ it votes for conflicts with the block $B$ of commitQC1, and the parent of $B'$ is before $B$. The validators who vote for commitQC1 and prepareQC# are corrupted validators.

There are $n=3t+1$ validators in the system. In both of the cases, the process of identifying culpable Byzantine validators involves performing appropriate quorum intersections: since two quorums of $2t+1$ validators intersect in $t+1$ validators, we are able to identify $\geq t+1$ Byzantine validators. Who are the witnesses? For conflicting commits within a round, the commitQCs themselves can prove culpability. For conflicting commits across rounds, honest validators having access to prepareQC# are witnesses. It turns out that the existence of commit2 implies that $2t+1$ validators should have received prepareQC#, out of which $f$ of them are Byzantine. The remaining $2t+1-f$ validators can act as witnesses. This also implies that the forensic support holds only when $f < 2t+1$; when the number of Byzantine validators are higher, we may have no witnesses.

An example attack is depicted in Figure 2, where conflicting commits across rounds occur; here $t+1$ red validators are corrupted. An honest validator commits $B$ in round $e$ on receiving commitQC1. The formation of commitQC1 indicates blue validators are locked on $(B,e)$. Some rounds later, a malicious leader proposes a conflicting block $B'$ and sends the proposal to another set of honest validators. According to the voting rule, green validators will vote and a higher prepareQC# for $B'$ is formed. After two rounds of voting, $B'$ is committed by another honest validator. In this example, red validators are held accountable since they vote for both commitQC1 and prepareQC#. Blue and green validators have access to the evidence.

Figure 2. An attack in action, where red validators are held culpable for sending stale prepareQC during view change, blue and green validators are witnesses.

Specification


The forensic module can be added on top of Diem without touching the existing codebase. It consists of two components, a database (FORENSIC_STORE) used to store quorum certificates received by validators, which can be accessed by clients through JSON-RPC requests, and an independent DETECTOR run by clients to analyze the forensic information.

Figure 3. Forensic module on top of Diem architechture.

A demo dashboard below shows the information collected by the forensic protocol and the analysis results.

Figure 4. Dashboard to display forensic information.

Implementation


A reference implementation is available at the following URL: https://github.com/wgr523/libra


Compatibility


The forensic module does not conflict with existing versions of Diem. A Diem validator that supports forensics appears no different within the Diem payment network (DPN) to Diem validators running older software versions. Diem validators that support forensics can receive RPC requests from clients and reply with messages. Obviously, older validators are not capable of replying to forensic requests. If a client who wants to use forensic features has no peers that support forensics, then it cannot gather enough information to have forensic support thus naturally decays to older clients.


References


  1. State Machine Replication in the Diem Blockchain. https://developers.diem.com/docs/technical-papers/state-machine-validatortion-paper/
  2. BFT Protocol Forensics. https://arxiv.org/abs/2010.06785
@simplespy simplespy added the enhancement New feature or request label Jan 6, 2021
@dahliamalkhi
Copy link
Contributor

Thanks @simplespy for an excellent description and an interesting proposal!

@aching @davidiw @zekun000 @dmitri-perelman any comments?

@aching
Copy link
Contributor

aching commented Jan 11, 2021

Thank you for the very detailed proposal and sample implementation @simplespy!

We've been thinking along the lines of how to identify byzantine behavior (even before a safety violation actually occurs) and how to report it. The obvious cases are both safety related (e.g double voting, voting on non-increasing rounds, etc.) and liveness related (e.g. proposing without a valid round proof).

This proposal fits in with this area of work. We'll want to study how to put this all together into a holistic forensics protocol that does its best to avoid slowing down performance (for instance - maybe async writes to the db for example) and the right place for the forensic APIs (e.g. forensic_get_latest_round and forensic_get_quorum_cert_at_round) as validators do not expose this JSON-RPC APIs outside of the validator network.

cc @mimoo - who was thinking about Byzantine logging

Perhaps we can follow up with some thoughts on how this whole area can make progress in a few days?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants