Block chains can be strongly consistent but not in the normal deterministic sense we are used to from consensus protocols like Paxos. Instead the strong consistency is probabilistic. That is, if one waits for long enough then the probability of a particular transaction being removed from the chain falls to negligible levels. This then provides the basis for a protocol that can treat the block chain as strongly consistent, where the protocol is “Wait until a transaction is X blocks deep in whatever chain one sees”.
Absolutely nothing said in this article is new. The point that block chains (and Bitcoin in particular) can be strongly consistent was made fairly emphatically but briefly here. A more extensive analysis is provided here. My goal in this article is just to provide an explanation for the source of consistency that is a bit more detailed than the first link and much less detailed than the second.
This paper assumes the reader is familiar with strong consistency as well as the basic design principles of block chains like Bitcoin. Brief reminders are provided below to refresh rusty memories.
2 Defining Strong Consistency
Imagine we have clients CA and CB. And imagine we have server S.
Whenever CA and CB want to read/write to a shared state they will send all their reads and all their writes to S.
This approach is great because CA can be absolutely sure that once it’s write is accepted by S then any future reads from CB will see CA’s write.
But what happens if S fails? Or what happens if network connectivity between CA or CB and S is lost?
The system may be easy to reason about but it’s not terribly robust.
Now imagine that we replace server S with three servers, SA, SB and SC.
One can think of strong consistency as guaranteeing that for a pool of machines (in this case SA, SB and SC) their collective behavior will be indistinguishable to clients (in this case CA and CB) from having only a single server (our original server, S).
So in a strongly consistent system one gets the benefit of the robustness of having multiple servers while still having the same data model as if there was only one server. So the clients don’t care that they are talking to a group of machines. To them the experience will be the same as if all requests went to a single machine.
It’s worth pointing out that strong consistency doesn’t guarantee that the whole system can’t fail. Just like server S could go down and now nobody can read/write anything, so it is the case in distributed systems that if enough machines go down then the system will stop being able to handle read/writes in a strongly consistent manner. Strong consistency just guarantees equivalent behavior to a single server, that is, if enough machines are available the system is functional and if enough aren’t available then the system will visibly fail.
3 Deterministic strong consistency
The grand daddy of strong consistency in a distributed system is Paxos. For those who aren’t familiar with Paxos you can read Paxos Made Simple. I even made some ugly pictures to go along with the article. They helped me, at least, better understand the article.
The way this all works is that Paxos depends upon having a known membership list of the machines in its quorum (where the quorum is how we refer to the group of machines, the servers if you will, that are trying to maintain the shared state). [A] [A] It is possible to correctly implement Paxos in a multicast system where the exact identities of all members isn’t known but you do need to know the number of members and membership can’t change without going through Paxos itself. So the multicasting point is really just nit picking.
Paxos works by ensuring that a write will only be accepted if a majority of the quorum confirm they have accepted it. Reads work the same way, that is, a reader would only accept a value as being correct if that value was returned by a majority of the quorum members.
There are entire mountains of paper written about optimizations to Paxos but the above is the core idea.
The reason Paxos is interesting is that it was the first consensus protocol that was provably correct. As long as the members of the system follow the protocol (and ignoring attacks), one is guaranteed that the system will always have strong consistency.
4 Probabilistic strong consistency
Block chains are intended to be completely decentralized. There is no central organization who decides who can be part of the set of machines that maintain the block chain. So right off the top the quorum mechanism of strong consistency won’t work because by definition we don’t know who is in the quorum. Since there is no central control of any kind anyone can leave or join at any time.
4.1 Nakamoto Consensus
This has led to what is called Nakamoto consensus after the creator of Bitcoin who first proposed it. The idea behind Nakamoto consensus is to allow for consensus among a group of machines that don’t trust each other, may have some very bad actors and where there is no central organizer to control who can and cannot join the system.
In block chains there is a competition (at least in proof of work systems, proof of stake works differently) to create new blocks. Typically different groups of what are called miners are all racing to solve a proof of work problem off the current consensus chain [B] [B] Systems like Bitcoin have what are called checkpoints that identify specific chain histories as being canonical. These checkpoints are published through a centralized mechanism (the reference client). This is effectively strong consistency by fiat. Which is legal but doesn’t exactly prove out the Bitcoin contention that it is a fully distributed system. So the issues discussed in this article just deal with what happens between checkpoints. Note however that as cheating goes this isn’t particularly bad. One shouldn’t let theory get in the way of a billion dollar working system.. A problem is that it’s possible for two different groups of miners to both solve the proof of work for the next block at the same time. This creates two alternate versions of the block chain.
Typically the rule is that a miner will accept whatever chains they know about that are the longest [C] [C] And the hardest. We’ll talk more about this later. But if the miner finds out about multiple equally long but different chains then we are in an indeterminate situation.
The way this situation is resolved is where probabilities come in.
The proof of work problems involve solving problems (typically something with hashing) that are known to take serious computational resources. In general the probability of solving the problem is a function of the computational resources that one has. So in general if one controls say 30% of the computational resources of all members of the block chain ecosystem then one should, on average, “win” solving the proof of work 30% of the time. But this is not necessarily so. It is possible to get lucky and solve the problem faster than others regardless of the percentage of computational power one has. Equally one can get unlucky and take much longer.
Let’s take a situation where there are only two miners and each controls exactly 50% of the computational resources. Each one is trying to mine the next block and they both end up mining a block at the same time. This creates a fork since there is no way to decide which of their forks is authoritative. In theory, since their resources are identical, they could keep on each mining blocks at the same time and the fork could go on forever. But the probability of this is very low. Thanks to luck it is inevitable that one miner will produce a block before the other, successfully communicate this fact to the other miner and thus have the longest chain and thus become authoritative.
Thanks to probabilities, eventually someone gets a new block soon enough and communicates it far enough to become the longest chain and everyone will then switch to mining on their chain.
4.2 The user eye view
Imagine you are someone who is trying to put a transaction on the chain. You give your transaction to a miner who sticks it in their block. But a fork happens and your transaction is in the block from one group of miners but is not in a block from another group of miners in a forked chain. You can’t know ahead of time which fork will win so you have no idea if your transaction will become authoritative. [D] [D] A variant of this happens in Paxos. If a proposer sends out a proposed value it can’t know if its value will be accepted until it gets a quorum of responses which depending on machine failures and network failures could be an indeterminate amount of time. Even if a majority of the quorum is functional it is still possible that the proposal gets beat out by another proposal and has to be tried again. The point is that there is a set of logic one has to follow which may or may not result in one’s value being accepted. But what one is guaranteed is either the value will be accepted (forever) or it will be rejected. There is a final, deterministic, outcome.
The probabilistic nature of the proof of work means that eventually someone is going to win and have the longest chain. But how do you define “win”?
Remember, we don’t know who all the miners are and what they have seen. Furthermore due to communication and other issues it’s quite possible for forks that are 3 or more blocks deep to happen. So how does one know when it’s safe to say “My transaction is official”?
According to this paper the rough probability of a fork n blocks long existing is O(2 − n). That is, each block added exponentially reduces the probability of another forked chain of that size existing.
So in a deterministic system like Paxos one knows one’s value is right (for either reading or writing) once a majority of the known quorum of members have agreed.
In a block chain one treats a value as being right (again, for reading or writing) only once it is enough blocks in that the probability of a fork existing that doesn’t include that block is acceptably small.
So we are to ignore the last X blocks at the tail of the chain where X is set to whatever our threshold of comfort is. Any transaction that isn’t X+1 blocks in or higher is ignored.
In the original Bitcoin reference client apparently the number of blocks to wait was set at 6. Looking here one can see that there is only one recorded instance on 7/3/2015 of a 7 block fork. Which would rather put paid to the traditional limit of 6. But I’ve had trouble finding any more data on this orphan. So what should X be? Beats me.
4.3 Computational difficulty
Conceptually this can feel sorta of weird. Couldn’t an attacker take the consensus chain and just shove X+1 blocks onto it and thus fake out a user who only honors transactions X+1 blocks or more back?
The answer is - probably not (there is that probability thing again).
Block chains are supposed to adjust the difficulty of their proof of work based on a rough measure of how much computational power is in the system. If that adjustment is accurate and if an attacker has a relatively small amount of processing power compared to the rest of the ecosystem then the attacker won’t be able to complete the proof of work at the appropriate level of difficulty faster than the legitimate miners.
This defense however assumes that the user trying to validate their transaction has a reasonable idea of what the level of difficulty is supposed to be. This is bootstrapped in Bitcoin by putting checkpoints (already mentioned) directly into the reference client.
But it’s worth taking a moment to talk about how block chains deal with changes in computational power and how it affects the difficulty level of the proof of work. As a block chain gets more popular it will attract more miners thus increasing the computational power available and thus increasing the probability of blocks being found and long forks happening.
Bitcoin deals with this via an algorithm built into its design which every 2016 blocks calculates the average rate at which blocks are being generated and then automatically calculates a new level of difficulty for the proof of work problem that will keep the rate at which blocks are found to roughly one every 10 minutes. This means that the rate adjustment should happen every 10 minutes * 2016 blocks / 60 minutes/hour / 24 hours/day / 7 days/week = 2 weeks.
As can be seen here the rate at which the difficulty changes isn’t smooth. Some periods have a lot of new mining power, some have smaller increases.
Of course there are attacks. Just to give a single example, if someone was willing to build a huge pool of potential mining power but keep it offline then they would have at least two weeks to really screw with the world. But this would be expensive and self defeating since it would likely crash the whole system. Of course that could be the point (aka a Goldfinger attack).
Historically, strong consistency was achieved deterministically. Follow the algorithm and you knew you were good. But in a system like a block chain with an unknowable membership, deterministic algorithms don’t work. So instead block chains can create strong consistency using a probabilistic model where after a certain number of blocks the probability of a value being removed from the chain due to a competing fork is small enough that it’s not worth worrying about. So the trick is to ignore the tail of the chain until one reaches a depth one is comfortable with.