I was reading through Paxos made simple and I really wished there were state diagrams to help explicate the protocol. So I wrote them up and share them below. Please keep in mind that the diagrams just explore naive Paxos, that is single value, no distinguished proposer or distinguished learner. So this version of Paxos is pretty useless in practice but it completely captures the core mechanisms that make Paxos work (with the exception of how to pick a distinguished learner). Please note that this article is intended to be used by someone going through Paxos made simple. It is an adjunct, not a replacement.
1 Modeling assumptions
Following the lead of the paper, messaging is modeled as one way unicast with guaranteed reliability and with no repeats. Guaranteed reliability, btw, only means that the message will get there, it says nothing about when.
One way messaging also means that unless the receiver of the message explicitly decides to respond to the message there is no way for the sender to know when the message arrived or what happened as a consequence of receiving it.
Sending a message is executed as sendMessage(address, purpose of message, arguments...). Receiving a message is executed as messageReceived(purpose of message, arguments...).
The state diagrams below are assumed to be single threaded. In cases where an entity is waiting for messages only one message will be processed at a time and any remaining messages that have arrived will be put into an infinitely big queue to be processed one by one.
Finally, any time a value is assigned it is assumed to be assigned in some suitably persistent way (e.g. written to a disk, stored on tape, chiseled into a stone tablet, etc.)
These simplifications do not alter the algorithm, they just make the state diagrams easier to read.
We begin by assuming that some client somewhere instantiates a proposer. How this happens is out of scope for the algorithm definition.
Notice that there is no failure logic in the state machine. This is because the default algorithm doesn't send error messages when requests are rejected. So if a quorum's worth of the proposal requests aren't accepted then the proposer will stay in the wait state forever.
Similarly there is no obvious way for the proposer to know if their accept request was accepted by a quorum of acceptors. First, acceptors don't send errors for rejected requests and second they send acknowledgements of accepted requests to learners, not proposers.
None of these issues alter the algorithm and there are plenty of ways to deal with them in practice so we follow the Paxos made simple paper and ignore them.
The arguments passed to the proposer are:
acceptorList This is the list of all acceptors in the cluster. Note that in multicast based systems this could just be the multicast address but in that case there would need to be a fourth argument to record how many acceptors there are in the cluster.
proposalNumber The proposal number that the proposer is supposed to start out proposing. Note that Paxos requires that different proposers use unique proposal numbers but doesn't specify how this is to occur. There are a number of ways to achieve this but they aren't important for understanding the core algorithm. The key thing to understand is that the initial proposalNumber will be unique amongst all other proposals.
proposalValue This is the value that is supposed to be proposed.
proposerAddress This is the address to which messages to the proposer can be sent, this is needed by acceptors to send in responses to successful prepare messages.
The proposer also has the following local variables:
acceptedPrepareCount This records how many acceptors have accepted the proposer's prepare request
seenAcceptedProposalNumber This is the highest proposal number that was returned in a response to a prepare message seen so far.
2.2 A question of identity
There is an assumption that the proposerAddress is absolutely unique and used only once with a single proposal. Put another way, each time a proposer is instantiated it conceptually created a universally unique address for itself that it will never re-use for any other proposals. That way when it receives a message it knows that the message is for it. This is important because otherwise a proposer could receive a message that got 'stuck' in the system for an old version of itself that was proposing something else and got re-used for this proposal.
Of course in practice one doesn't have to get anywhere near this fancy. Just sticking the proposalNumber being responded to in the ProposalAccepted message would a long way to taking care of the potential naming issues. But it would also make the diagram more complex.
It's interesting to speculate however as to what would happen if two instances of a proposer existed which had the same address. This is entirely plausible, especially in a distributed cloud environment. This isn't something the algorithm was designed for, there is an assumption that messages fined their way to the 'right' place, eventually. As long as messages don't get duplicated (which, for protocols like UDP is also entirely plausible) it's probably isn't a big deal but if message can get duplicated then the protocol could entire an 'impossible' state where two or more acceptors have accepted the same proposalNumber but with different values. That isn't a legal state in the protocol.
The guard conditions at the start of AcceptPrepare and AcceptAccept should really be on the links and not in the states. But doing it 'right' made the picture enormous so I had to push them into the states for readability.
When an acceptor is created it is passed in:
learnersList Specifies the list of entities to notify when a value is accepted.
acceptorAddress The acceptor's own address, this is used when notifying learners
Other than talking about distinguished learners in the context of scalability the Paxos Made Simple paper doesn't actually say much about learners. So I've taken a very conservative approach and just assumed that learners just record what they learn, no more. There is one state variable, learnedValue. It's a dictionary where the key is an acceptor's address and the value is whatever value the acceptor has accepted. Presumably anyone who wants to use what a learner has learned would look at the dictionary and see if any value in the dictionary is quorate.