My technical wanderings of late at Microsoft have taken me into the realm of massively distributed storage. Of course, I've been here before but this time I need to bring some other folks along. So I was asked to put together suggested readings to help people come up to speed. I thought the list might be of general interest so I'm posting it here.
What do you think? Is this a good list? A bad one? What would you suggest?
An alternate way to building large scale stores without consistency, worth having in mind - Dynamo
Working in highly distributed storage systems almost certainly means dealing with Paxos (or one of its variants) so - Paxos made simple (make sure to also check out my companion article as well) & an outstanding Google paper on Paxos called Paxos made live.
I suspect that anyone who ends up working in this area for any period of time will end up reading the following list of articles but they aren't necessary from day one:
Wikipedia has a great summary of the various variants of Paxos that are running around.
Chubby is Google's abstraction layer to make Paxos easier for developers to handle.
A great summary of ZooKeeper, this is Hadoop's answer to Chubby.
A summary of ZAB, the 'Paxos Like' mechanism that is underneath ZooKeeper.