Internet Protocols – Stuff Yaron Finds Interesting

The block chain and the CAP Theorem

In this article I argue that depending on how one programs one’s client, one can build a Consistent and Partition Tolerant or Available and Partition Tolerant system on top of Bitcoin or really any block chain. And no, that isn’t a contradiction and no this doesn’t violate the CAP theorem.

[Note: Lots of updates in response to feedback]

Continue reading The block chain and the CAP Theorem

How to make block chains strongly consistent

Block chains can be strongly consistent but not in the normal deterministic sense we are used to from consensus protocols like Paxos. Instead the strong consistency is probabilistic. That is, if one waits for long enough then the probability of a particular transaction being removed from the chain falls to negligible levels. This then provides the basis for a protocol that can treat the block chain as strongly consistent, where the protocol is “Wait until a transaction is X blocks deep in whatever chain one sees”.

Going off chain for storage

It turns out that building a distributed database with ACID behavior (aka a block chain) isn’t easy, it requires a lot of code and a lot of processing. As a result block chains like Bitcoin can process around 4 transactions/second. A pretty slow pace for a globe spanning system. To work around this and other issues I explore below, we hear more and more about off block chain storage. But it turns out that if you store off the chain then you lose the chain’s ACID guarantees. In many cases that loss is fine but it does call into question if the use case that can leverage off chain storage really needs the chain at all.

[Update: Thank you to Shawn Cicoria for pointing out that my original price quote for storing a Gigabyte of data in Ethereum was off by a factor of 32. My mistake is that I did the calculations forgetting that the gas price is per Ethereum Word which is 32 bytes.]

Why are enterprises interested in the block chain?

My day job has required me to look into issues related to using the block chain in the enterprise. This lead me to a simple question - why would an enterprise be interested in the block chain for running its own business?

State diagrams for Paxos made simple

I was reading through Paxos made simple and I really wished there were state diagrams to help explicate the protocol. So I wrote them up and share them below. Please keep in mind that the diagrams just explore naive Paxos, that is single value, no distinguished proposer or distinguished learner. So this version of Paxos is pretty useless in practice but it completely captures the core mechanisms that make Paxos work (with the exception of how to pick a distinguished learner). Please note that this article is intended to be used by someone going through Paxos made simple. It is an adjunct, not a replacement.

Wrapped or Native Paxos?

So let's say I want to build a nice highly consistent multi-data center store, something like Megastore. Most everyone at this point has something like Bigtable already deployed in their data centers. What they typically don't have is a way to keep different instances of their table stores guaranteed consistent with each other across DCs. Megastore steps in to address this issue. But this begs a fundamental question - what's better, to wrap a Paxos coordinator on top of existing table stores or to build a new Paxos native storage service?

Average, percentiles and measuring service performance

Measuring the performance of services is tricky. There is an almost irresistible desire to measure average performance. But measuring service performance using averages is pretty much guaranteed to provide misleading results. The best way (I know of anyway) to get accurate performance results when measuring service performance is to measure percentiles, not averages. So Do Not use averages or standard deviations, Do use percentiles. See below for the details.

Distributed Storage Reading List

My technical wanderings of late at Microsoft have taken me into the realm of massively distributed storage. Of course, I've been here before but this time I need to bring some other folks along. So I was asked to put together suggested readings to help people come up to speed. I thought the list might be of general interest so I'm posting it here.

What do you think? Is this a good list? A bad one? What would you suggest?

OAuth 2.0 Bearer tokens – unsafe at any speed?

Eran's latest article raises a number of specific security threats by way of arguing that bearer tokens are irredeemably insecure. In this article I examine the attacks Eran calls out and demonstrate that they are already addressed by OAuth 2.0. Eran's article does bring up the interesting question of - do we need defense in depth for the tamper resistance and confidentiality provided by SSL/TLS?

Why does OAuth need request tokens?

OAuth's current access dance is based getting a request token that is later exchanged for an access token. Introducing the request token takes what could have been a 4 round trip protocol and makes it into a 6 round trip protocol. Couldn't we just simplify OAuth down to 4 round trips by getting rid of the request token all together? Or is there some critical use case enabled by request tokens that makes all the complexity worth the price?

[5/26/2009 – Updated with Q&A on open redirectors]

[6/2/2009 – Updated with a note from Allen Tom on another way to prevent open redirector attacks]

Continue reading Why does OAuth need request tokens?