My technical wanderings of late at Microsoft have taken me into the realm of massively distributed storage. Of course, I've been here before but this time I need to bring some other folks along. So I was asked to put together suggested readings to help people come up to speed. I thought the list might be of general interest so I'm posting it here.

What do you think? Is this a good list? A bad one? What would you suggest?

Required Reading

Google's Stack - GFS, Big Table & Megastore

An alternate way to building large scale stores without consistency, worth having in mind - Dynamo

Working in highly distributed storage systems almost certainly means dealing with Paxos (or one of its variants) so - Paxos made simple (make sure to also check out my companion article as well) & an outstanding Google paper on Paxos called Paxos made live.

Extra Credit

I suspect that anyone who ends up working in this area for any period of time will end up reading the following list of articles but they aren't necessary from day one:

Wikipedia has a great summary of the various variants of Paxos that are running around.

Chubby is Google's abstraction layer to make Paxos easier for developers to handle.

A great summary of ZooKeeper, this is Hadoop's answer to Chubby.

A summary of ZAB, the 'Paxos Like' mechanism that is underneath ZooKeeper.

2 thoughts on “Distributed Storage Reading List”

  1. Hi Yaron, note that the link to the Google paper on Paxos points to Lamport’s homepage on MSR. I assume it’s not intentional?

    1. Eeek! Thank you for catching this. It was a fat fingered cut and paste error on my part. It’s now fixed.

